Hi All,
I’m having a bit of a confusing time with different dataset names and where / how they are used. My confusion could also be an artefact of using renku dataset import
Context: I wanted to list dataset and the files they contain, to check everything is correct.
Renku version: 0.8.2 on Jupyterlab
renku dataset
gives a list of the datasets within the repo which is very handy. In the example here, I created three different datasets:
renku dataset create meteorology-raw-wind-summary
renku dataset import <link-to-zenodo-dataset>
renku dataset import --name test-import-dataset <link-to-zenodo-dataset>
$ renku dataset
ID DISPLAY_NAME VERSION CREATED CREATORS
------------------------------------ --------------------------- --------- ------------------- ------------------
4a3e1ca1-e8b2-48dd-b65d-ef30c5005d3f meteorologyrawwindsum 2020-03-19 14:04:56 J.Thomas
f6cca5d3-9d1d-495d-ad09-17cd92371fe0 summary_raw_wind_data_fr_11 1.1 2020-04-27 15:32:50 P.E.Carles,T.Jenny
18d733fd-958e-4f2f-b18a-f1ed06c0872c summary_raw_wind_data_fr_11 1.1 2020-04-27 16:06:12 P.E.Carles,T.Jenny
Problem: The commands I used to check files, and thereafter change datasets did not work as expected which I think is due to the name I was using. Which dataset should be used in renku dataset commands?
Example listing dataset files (ls-files):
The following output was as I expected. All files that I had added to the dataset were listed.
renku dataset ls-files meteorology-raw-wind-summary
ADDED CREATORS DATASET PATH
------------------- ---------- ---------------------------- -------------------------------------------------------------------------------------------------------
2020-03-19 14:05:22 Jen Thomas meteorology-raw-wind-summary /work/meteorology-raw-wind-legs0-4/data/meteorology-raw-wind-summary/metdata_wind_20161220_20170118.csv
2020-03-19 14:05:22 Jen Thomas meteorology-raw-wind-summary /work/meteorology-raw-wind-legs0-4/data/meteorology-raw-wind-summary/metdata_wind_20170122_20170223.csv
2020-03-19 14:05:22 Jen Thomas meteorology-raw-wind-summary /work/meteorology-raw-wind-legs0-4/data/meteorology-raw-wind-summary/metdata_wind_20170226_20170319.csv
2020-03-19 14:05:22 Jen Thomas meteorology-raw-wind-summary /work/meteorology-raw-wind-legs0-4/data/meteorology-raw-wind-summary/metdata_wind_20170322_20170411.csv
2020-03-20 11:55:19 Jen Thomas meteorology-raw-wind-summary /work/meteorology-raw-wind-legs0-4/data/meteorology-raw-wind-summary/metdata_wind_20161119_20161216.csv
2020-03-20 11:55:26 Jen Thomas meteorology-raw-wind-summary /work/meteorology-raw-wind-legs0-4/data/meteorology-raw-wind-summary/data_file_header.txt
2020-03-20 11:55:31 Jen Thomas meteorology-raw-wind-summary /work/meteorology-raw-wind-legs0-4/data/meteorology-raw-wind-summary/README.txt
To list the files, I used renku dataset ls-files name-given-on-create
which is displayed in the UI on renkulab.
However if I try the same for the other datasets within the repo, then it looks as though the datasets do not have any files - the following output was not expected. Files and datasets are listed though using git lfs ls-files. using the name that is displayed in the UI does not work, nor does the <DISPLAY_NAME>.
renku dataset ls-files summary_raw_wind_data_fr_11
ADDED CREATORS DATASET PATH
------- ---------- --------- ------
and
renku dataset ls-files test-dataset-import
ADDED CREATORS DATASET PATH
------- ---------- --------- ------
I wasn’t sure which to use to refer to the dataset here, but I don’t get the expected result using either the <DISPLAY_NAME> or the name used with renky dataset import. Files are being tracked using git lfs and appear in renkulab.io files and dataset sections, so seem to have been added to the dataset correctly.
Example unlinking dataset files:
renku dataset unlink --include metdata* test-import-dataset
Warning: You are about to remove following from "dataset" dataset.
/work/meteorology-raw-wind-legs0-4/data/summary_raw_wind_data_fr_11/metdata_wind_20161117_20161216.csv
/work/meteorology-raw-wind-legs0-4/data/summary_raw_wind_data_fr_11/metdata_wind_20161220_20170118.csv
/work/meteorology-raw-wind-legs0-4/data/summary_raw_wind_data_fr_11/metdata_wind_20170122_20170223.csv
/work/meteorology-raw-wind-legs0-4/data/summary_raw_wind_data_fr_11/metdata_wind_20170226_20170319.csv
/work/meteorology-raw-wind-legs0-4/data/summary_raw_wind_data_fr_11/metdata_wind_20170322_20170411.csv
/work/meteorology-raw-wind-legs0-4/data/test-import-dataset/metdata_wind_20161117_20161216.csv
/work/meteorology-raw-wind-legs0-4/data/test-import-dataset/metdata_wind_20161220_20170118.csv
/work/meteorology-raw-wind-legs0-4/data/test-import-dataset/metdata_wind_20170122_20170223.csv
/work/meteorology-raw-wind-legs0-4/data/test-import-dataset/metdata_wind_20170226_20170319.csv
/work/meteorology-raw-wind-legs0-4/data/test-import-dataset/metdata_wind_20170322_20170411.csv
Do you wish to continue? [y/N]: N
Aborted!
Here I used the dataset name that was given on import, but trying to unlink files suggests that files from two different datasets could be removed.
Questions:
- which dataset names should I use with the renku dataset commands, particularly when using a dataset that has been created using renku dataset import?
- is there a way to show all the names associated with a dataset?