Retroactively add data to dataset

LiliGasser · 23 June 2022 09:07

Hi renkuers,

background:
There is a main repository which has been used extensively and contains all the data. Now, I am creating a new repo with just a small sub-part of the code. I would like to use the data from the main repo.

In the main repo, the data is not yet (part of) a renku dataset. How can I create a renku dataset out of this without doing any changes to the data itself?

It would also be helpful, if you could point me to that part of the documentation which handles dataset import from another repo.

Thanks in advance,
Lili

mohammad-sdsc · 23 June 2022 09:41

Hi Lili,

You can add existing data to a dataset inside a renku project. Let’s say that your data resides in a my-data directory in the root of the project. You can run
renku dataset add --create my-dataset my-data
to add all the files inside the my-data directory to a dataset named my-dataset.

Normally, when you add some files to a dataset, they will be copied into a data/my-dataset directory in your project. However, if the data already exists in the project, Renku won’t copy it and just adds the metadata to the dataset. So, in your case no copy will be made.

When you’ve created a dataset and pushed the project, you can import the dataset in the other project by
renku dataset import https://renkulab.io/datasets/<dataset-id>
or
renku dataset import https://renkulab.io/projects/<username>/<project>/datasets/<dataset-name>.
You can get the dataset-id by going to project’s page in the UI and viewing its dataset tab.

Note that if your source project/dataset is private, you need to login in the CLI to be able to access the KG (which is required for the import):
renku login renkulab.io

You can find documentation regarding the dataset import in Renku Command Line — Renku documentation under Importing data from other Renku projects: section.

Cheers,
Mohammad

LiliGasser · 23 June 2022 14:34

Hi Mohammad,

thanks a lot for your very quick and precise answer. All worked smoothly when following your suggestions!

Cheers,
Lili

Topic		Replies	Views
Proper use of renku dataset to link data from project to project Renku (CLI)	15	1030	27 October 2020
Import renku dataset in a notebook / script Renku (CLI)	3	393	24 April 2020
Importing dataset: resource not in KG Renku (CLI)	5	464	2 April 2021
Can't add dataset to project	2	225	23 June 2022
Add code from different project	20	761	22 May 2024

Retroactively add data to dataset

Related topics