Updating a dataset created within renku project

bopekno · 18 September 2020 14:47

Hello,

Let’s say I’ve created a dataset within a renku project with renku dataset create ds1, to which I have added data after a first run: renku dataset add ds1 data/file1.

I have now ran a new version of the analysis, and updated data/file1 on git lfs (that was automatic as the data/file1 was tracked). However, I don’t understand how to update ds1 so that it now contains the updated versoin of data/file1. I’ve tried renku dataset update ds1 data/file1, without success. Should I unlink and re-add ? That seems convoluted.

Thanks for your help,

Cyril

ralf.grubenmann · 18 September 2020 14:50

Hi Cyril,

you can use renku dataset add --overwrite data/file1 to overwrite the existing file in the dataset with the new file.

Cheers,

Ralf

bopekno · 18 September 2020 14:59

Thanks for the reply !

That does not seem to work:

bopekno · 18 September 2020 15:10

Just to make it clear: the dataset still contains an old version of the file (september 16)
^

But the project clearly contains a new version of the file on git lfs

ralf.grubenmann · 21 September 2020 10:32

My guess would be that it actually is already updated, but that this isn’t clear from the way renku shows dataset files, nor does the update actually achieve what you want it to achieve.

I’ve created an issue https://github.com/SwissDataScienceCenter/renku-python/issues/1531 to improve UX around this and properly support your use case.

In the meantime, doing a renku dataset unlink followed by a renku dataset add is probably the best workaround.

bopekno · 24 September 2020 11:58

Just confirming that renku dataset unlink followed by renku dataset add worked well. However, I was using that dataset as an import in another renku project. In this project, renku dataset update did not do anything, the only way to get access to the new file was to renku dataset rm and reimport.

Topic		Replies	Views
Retroactively add data to dataset Renku (CLI)	2	277	23 June 2022
Proper use of renku dataset to link data from project to project Renku (CLI)	15	1029	27 October 2020
Renku dataset avoid redundant directories	3	306	28 August 2020
"Refreshing project data..." does not finish after adding datasets RenkuLab	11	194	10 August 2023
Renku dataset edit Renku (CLI)	19	613	27 March 2020

Updating a dataset created within renku project

Related topics