Updating a dataset created within renku project

Hello,

Let’s say I’ve created a dataset within a renku project with renku dataset create ds1, to which I have added data after a first run: renku dataset add ds1 data/file1.

I have now ran a new version of the analysis, and updated data/file1 on git lfs (that was automatic as the data/file1 was tracked). However, I don’t understand how to update ds1 so that it now contains the updated versoin of data/file1. I’ve tried renku dataset update ds1 data/file1, without success. Should I unlink and re-add ? That seems convoluted.

Thanks for your help,

Cyril

1 Like

Hi Cyril,

you can use renku dataset add --overwrite data/file1 to overwrite the existing file in the dataset with the new file.

Cheers,

Ralf

Thanks for the reply !

That does not seem to work:

Just to make it clear: the dataset still contains an old version of the file (september 16)
^

But the project clearly contains a new version of the file on git lfs

My guess would be that it actually is already updated, but that this isn’t clear from the way renku shows dataset files, nor does the update actually achieve what you want it to achieve.

I’ve created an issue https://github.com/SwissDataScienceCenter/renku-python/issues/1531 to improve UX around this and properly support your use case.

In the meantime, doing a renku dataset unlink followed by a renku dataset add is probably the best workaround.

Just confirming that renku dataset unlink followed by renku dataset add worked well. However, I was using that dataset as an import in another renku project. In this project, renku dataset update did not do anything, the only way to get access to the new file was to renku dataset rm and reimport.