I successfully created a dataset
with files that are >10MB. However, in Renku (both in renkulab jupyterlab and after renku clone) the files have only 133 bytes.
Another dataset I created this way worked well.
Is there anything I am doing wrong or is this unexpected? Thank you for your help.
It looks like the file in the first one is stored as git LFS, meaning you only see the pointer without pulling the files.
In the second repo, the .h5 file is not stored as git LFS, meaning all file is within git, so that one does not require you to pull it explicitly.
To simplify pulling the data you can use
renku storage pull or tick the box “Automatically fetch LFS data” when you start your session (you need to start from the drop-down menu)
You can also make this the default for the project in project settings:
If you use this option, make sure that you launch a session with enough disk space for the data, otherwise your session will crash on launch. It’s always safer to pull data selectively inside the session. If you want to remove LFS files from the local cache (because you need to recover disk space) you can use the
renku storage clean command.