Using a dataset

Sorry in advance if I have missed the obvious documentation page answering my question, as it seems pretty basic.

I have added a dataset that I would like to use in one of my environments. The is visible where I expect to see it, i.e. data/ms3/ MS3TMT10_01022016_32917-33481.mzML, but that file only contains the references to the actual data:

$ cat data/ms3/MS3TMT10_01022016_32917-33481.mzML
version https://git-lfs.github.com/spec/v1
oid sha256:d9d65d06e7b118b0aa68511f90bb11d3d1c4f697c92047eca0f11b70a411e526
size 36091582

The Renku CLI metioned importing data, which is already the case for me:

renku dataset import https://renkulab.io/datasets/fd64bfb7-7446-4c2f-ae89-c73d9c7a022d
CHECKSUM                                  NAME                                  SIZE (MB)  TYPE
----------------------------------------  ----------------------------------  -----------  ------
fa51bb1bccb29f60409e6c79bee3b48d58a3574c  MS3TMT10_01022016_32917-33481.mzML        36.00  mzML
Warning: Do you wish to download this version? [y/N]: y
Error: Dataset exists: "ms3".

My question is - what now? How can I materialise the actual data file and make use of it?

That means the file is stored in Git LFS (Large File Storage). You can either enable the checkbox to “Automatically fetch LFS data” when creating the environment or use the commands documented in Data in Renku — Renku documentation to manually pull the contents of the files from Git LFS.

1 Like

I was trivial indeed - thank you!