Adding original data

rcnijzink · 25 March 2021 08:41

I have a case where I have processed data in my repository, which was added from a local directory. Now, I would like to add the original data as well to the repository, and do the processing with renku too. This would lead to the same processed data in the end, but if I do this, will all the analyses that use that processed data be outdated? So, do I need to re-run everything afterwards again?

ralf.grubenmann · 25 March 2021 08:48

In general, yes, everything would be outdated.

But if your analysis is deterministic and the output doesn’t change, there won’t be a new commit and as such it would not be considered outdated. But in that case renku wouldn’t detect it as an output, as nothing changed, so you might have to declare it as an explicit output when doing the renku run.
But I am not sure if renku would consider this new workflow if there is an actual update later on, since you’d have a weird corner case where a workflow depends on another workflow that was run after the dependent workflow.
So I’m not entirely sure what would happen if you changed the original data later on and ran renku update, if it’d just execute this new, first step, or the whole pipeline. At least for the current way renku handles workflows, in the new metadata I mentioned in the other discussion, it’d probably be less of an issue (so renku graph update might work where renku update might fail or just execute the first step).

rcnijzink · 25 March 2021 09:01

Ok, thanks! I’ll try it out!

Topic		Replies	Views
Feedback from new user Renku (CLI)	6	342	7 June 2022
KG and circular dependencies	0	196	2 August 2022
Renku dataset avoid redundant directories	3	306	28 August 2020
Retroactively add data to dataset Renku (CLI)	2	277	23 June 2022
Feedback renku version 1.0.0rc2: update does not recognize input file changes after workflow execution with new parameter Renku (CLI)	6	303	24 November 2021

Adding original data

Related topics