Hello,
Let’s say I’ve created a dataset within a renku project with renku dataset create ds1
, to which I have added data after a first run: renku dataset add ds1 data/file1
.
I have now ran a new version of the analysis, and updated data/file1
on git lfs (that was automatic as the data/file1
was tracked). However, I don’t understand how to update ds1
so that it now contains the updated versoin of data/file1
. I’ve tried renku dataset update ds1 data/file1
, without success. Should I unlink and re-add ? That seems convoluted.
Thanks for your help,
Cyril
1 Like
Hi Cyril,
you can use renku dataset add --overwrite data/file1
to overwrite the existing file in the dataset with the new file.
Cheers,
Ralf
Thanks for the reply !
That does not seem to work:
Just to make it clear: the dataset still contains an old version of the file (september 16)
^
But the project clearly contains a new version of the file on git lfs
My guess would be that it actually is already updated, but that this isn’t clear from the way renku shows dataset files, nor does the update actually achieve what you want it to achieve.
I’ve created an issue https://github.com/SwissDataScienceCenter/renku-python/issues/1531 to improve UX around this and properly support your use case.
In the meantime, doing a renku dataset unlink
followed by a renku dataset add
is probably the best workaround.
Just confirming that renku dataset unlink
followed by renku dataset add
worked well. However, I was using that dataset as an import in another renku project. In this project, renku dataset update
did not do anything, the only way to get access to the new file was to renku dataset rm
and reimport.