Retroactively add data to dataset

Hi renkuers,

background:
There is a main repository which has been used extensively and contains all the data. Now, I am creating a new repo with just a small sub-part of the code. I would like to use the data from the main repo.

In the main repo, the data is not yet (part of) a renku dataset. How can I create a renku dataset out of this without doing any changes to the data itself?

It would also be helpful, if you could point me to that part of the documentation which handles dataset import from another repo.

Thanks in advance,
Lili

Hi Lili,

You can add existing data to a dataset inside a renku project. Let’s say that your data resides in a my-data directory in the root of the project. You can run
renku dataset add --create my-dataset my-data
to add all the files inside the my-data directory to a dataset named my-dataset.

Normally, when you add some files to a dataset, they will be copied into a data/my-dataset directory in your project. However, if the data already exists in the project, Renku won’t copy it and just adds the metadata to the dataset. So, in your case no copy will be made.

When you’ve created a dataset and pushed the project, you can import the dataset in the other project by
renku dataset import https://renkulab.io/datasets/<dataset-id>
or
renku dataset import https://renkulab.io/projects/<username>/<project>/datasets/<dataset-name>.
You can get the dataset-id by going to project’s page in the UI and viewing its dataset tab.

Note that if your source project/dataset is private, you need to login in the CLI to be able to access the KG (which is required for the import):
renku login renkulab.io

You can find documentation regarding the dataset import in Renku Command Line — Renku documentation under Importing data from other Renku projects: section.

Cheers,
Mohammad

1 Like

Hi Mohammad,

thanks a lot for your very quick and precise answer. All worked smoothly when following your suggestions!

Cheers,
Lili

1 Like