I have a case where I have processed data in my repository, which was added from a local directory. Now, I would like to add the original data as well to the repository, and do the processing with renku too. This would lead to the same processed data in the end, but if I do this, will all the analyses that use that processed data be outdated? So, do I need to re-run everything afterwards again?
In general, yes, everything would be outdated.
But if your analysis is deterministic and the output doesn’t change, there won’t be a new commit and as such it would not be considered outdated. But in that case renku wouldn’t detect it as an output, as nothing changed, so you might have to declare it as an explicit output when doing the renku run
.
But I am not sure if renku would consider this new workflow if there is an actual update later on, since you’d have a weird corner case where a workflow depends on another workflow that was run after the dependent workflow.
So I’m not entirely sure what would happen if you changed the original data later on and ran renku update
, if it’d just execute this new, first step, or the whole pipeline. At least for the current way renku handles workflows, in the new metadata I mentioned in the other discussion, it’d probably be less of an issue (so renku graph update
might work where renku update
might fail or just execute the first step).
Ok, thanks! I’ll try it out!