Hi all renkuers!
As far as I remember, the KG Renku creates for tracking the workflow does not allow circular dependencies, i.e. the input and output files cannot be the same. I understand the reason for that, but now I found that might be problematic for a really general use-case.
Imagine my repo is just creating a DB of information parsed from somewhere, and I expect once in a while some new data will be available, and therefore, I will need to update the output DB file accordingly. I know renku
implements methods for doing that, but my DB took originally 3 days to be created, and I don’t want to rerun everything again, but just add some new content.
In principle, I could handle that by just creating a new file each time with a time-stamp, plus a -latest
file. And when updating the DB, just use the last timestamp, and create a new one and update the -latest
file. But in that case, I will end up with too many files of large size, and a repo not as clean as I would like to.
Therefore, I come here asking for hints to resolve this I know you will have some amazing solutions for me.
Thank you so much in advance!
Cheers
Luis