Renku dataset import is extremely slow

When importing a dataset from one repository to use in a session from another repository, it takes an extremely long time:

In repository A, I made a dataset using the command:

wget http://images.cocodataset.org/zips/val2017.zip
unzip val2017.zip
renku dataset add --create --move coco val2017
rm val2017.zip
renku save -m "added coco dataset"

This already takes quite a while ~1 hour for 5000 files, 77 MB

In a session for repository B, I try importing this dataset:

renku dataset import -y https://limited.renku.ch/datasets/c0700196bc954037994d1c201e5b34c3

This took about 10 hours. The session doesn’t seem to be resource limited, it was started with 2 cores and 16 gigs of ram.

Repeating the process for a 500MB repository has been running for 60 hours now.

These are repositories of standard image datasets that aren’t all that big. What am I doing wrong?

Hello @mattstark,

Thanks for reporting this issue to us!

You are doing everything right. Unfortunately, it’s Renku CLI that is slow when adding a large number of files (regardless of their size). We’ve created a story to address this issue (normally will be done in 2-3 weeks). I’ll keep you posted once it’s done.

Kind regards,
Mohammad