How to make available a large Calibration dataset

hi @carloferrigno, I’ve put together a project that uses rclone to copy data on startup from your switch drive folder. You can find it here: https://renkulab.io/projects/rok.roskar/rclone-sync-demo

The downside is that the session start-up is delayed by ~1 minutes due to the copy. You can see in the post-init.sh script that it’s just a single command - you could also instruct users to run that when they need it instead of running it on start-up.

I made a link from /opt/ccf to /home/jovyan/work/data/ccf – that’s because the /opt directory is provided by the overlay filesystem and should not be used for data. Just keep that in mind - some scripts might complain about symbolic links.

I used rclone with a webdav config because it can parallelize the file transfer. If you want it to be simpler, you could also just use wget, but it’s not as fast.

Hope this helps!

This is the diff:

diff --git a/.renku/renku.ini b/.renku/renku.ini
index 1659a57..0bce25e 100644
--- a/.renku/renku.ini
+++ b/.renku/renku.ini
@@ -1,2 +1,9 @@
 [interactive]
 default_url = /lab
+disk_request = 50G
+
+[renku]
+autocommit_lfs = false
+lfs_threshold = 100kb
+check_datadir_files = true
+
diff --git a/Dockerfile b/Dockerfile
index a485fcb..6117695 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -41,6 +41,10 @@ FROM renku/renkulab-py:3.10-0.18.1
 #    vim
 # USER ${NB_USER}
 
+USER root
+RUN ln -s ${HOME}/work/data/ccf /opt/ccf && chown -R 1000:100 /opt/ccf
+USER ${NB_USER}
+
 # install the python dependencies
 COPY requirements.txt environment.yml /tmp/
 RUN mamba env update -q -f /tmp/environment.yml && \
diff --git a/post-init.sh b/post-init.sh
new file mode 100644
index 0000000..8fad2a8
--- /dev/null
+++ b/post-init.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+
+# copy over the data
+rclone --config ./rclone.conf copy data: /home/jovyan/work/data/ccf -P --transfers 8
diff --git a/rclone.conf b/rclone.conf
new file mode 100644
index 0000000..db7c6fe
--- /dev/null
+++ b/rclone.conf
@@ -0,0 +1,6 @@
+[data]
+type = webdav
+url = https://drive.switch.ch/public.php/webdav
+vendor = owncloud
+user = O8CE613GjhKMhtU
+