KG and circular dependencies

Hi all renkuers!

As far as I remember, the KG Renku creates for tracking the workflow does not allow circular dependencies, i.e. the input and output files cannot be the same. I understand the reason for that, but now I found that might be problematic for a really general use-case.

Imagine my repo is just creating a DB of information parsed from somewhere, and I expect once in a while some new data will be available, and therefore, I will need to update the output DB file accordingly. I know renku implements methods for doing that, but my DB took originally 3 days to be created, and I don’t want to rerun everything again, but just add some new content.

In principle, I could handle that by just creating a new file each time with a time-stamp, plus a -latest file. And when updating the DB, just use the last timestamp, and create a new one and update the -latest file. But in that case, I will end up with too many files of large size, and a repo not as clean as I would like to.

Therefore, I come here asking for hints to resolve this :smile: I know you will have some amazing solutions for me.

Thank you so much in advance!