Hi,
I’m very new to Renku (I followed the basic tutorial yesterday) and to data workflow management tools in general, though some months ago I had already started to learn about DVC (without much practicing yet).
From my very basic viewpoint, I have the feeling that Renku and DVC are quite similar in terms of their CLI features. Please correct me if I’m wrong. I’m very much keen to start using Renku which sounds to me both great and fairly straightforward to understand, but before putting my eggs in one basket, I’d like to learn more about what makes Renku (CLI and Renkulab) different from DVC, and potentially better…
Any feeback on this? Thanks a lot!
1 Like
You are right, there is quite a bit of overlap between DVC and the CLI features that deal with data versioning and lineage. DVC is a great project, and they are doing very interesting work as well, but I would say the main difference is that Renku is not just the Renku-CLI, but also the RenkuLab UI. RenkuLab has features to support collaborative work on projects and makes it possible to launch zero-install environments with all your tools.
For example, you can take a look at this project on our public RenkuLab instance:
https://renkulab.io/projects/covid-19/covid-19-public-data
Under the hood, there are also some differences. For example Renku uses GitLFS for data versioning, whereas DVC has their own solution. In general, Renku tries to build on existing solutions rather than implementing new ones. For example, RenkuLab uses GitLab under the hood, and we support users in going to GitLab directly if it provides something they need.
Thanks a lot @cramakri for your fast reply! This is very helpful. I agree with you that Renkulab with zero-install envs is a great point (especially to enlarge audience). Integration with gitlab and with zenodo are also both very nice and useful.
Another critical difference is that apart from reproducibility, which DVC also addresses, Renku is based around the ideas of reuse and collaboration. So, while DVC allows you to track your pipelines in a single project, Renku can make connections across projects - this allows projects to reuse code and data from other projects while retaining complete information about the provenance. For commonly used datasets, for example, it means that it is easy to identify other uses of the data and perhaps discover new workflows and ideas.
Ok @rrrrrok! Good point. I feel that the outcome of this threat should somehow appear in the Renku docs… (mostly the reason for me to open this thread primarily was that I couldn’t find such pros / cons in the docs, apart from the comparison to make
in the Renku CLI doc pages).
1 Like
Yes, that is an excellent point @sylvaine! We are slowly chipping away at restructuring the docs so this input comes at the right time…