Renku freezes when a large number of files are created

Juan · 28 March 2021 23:38

Hi All,

I have a compressed tar file, which contains a large number of PDF files. I’m trying to preprocess the data for a model in the following way:

extract tar
each page of pdf files → downsample and store as jpg file
create a .txt file for each of the files created in step 2 (I’m creating files of labels). I’m trying to train an existing model, which requires such input structure
compress all the newly created files (.jpg and .txt) into a new tar
track the new tar in LFS

The issue I am facing is that whenever I finish steps 2 and 3, the environment basically freezes (I cannot close/open/access terminal/…) and I basically need to close environment and restart it, thereby losing my newly created files, since they are not yet added to LFS.

I’m pretty sure it’s related to how git handles large quantities of files (I could see git processes taking 100% CPU from time to time, even though I hadn’t yet committed anything), but I don’t know how to fix it. Note, order of magnitude of new files created is probably O(10k).

Any help appreciated.
Best,
Juan

Topic		Replies	Views
Some files are replaced with links RenkuLab	3	244	24 May 2022
Cannot view dataset files through renkulab or gitlab RenkuLab	2	288	16 March 2020
Problems with maximum disk space	2	319	16 September 2021
Git lfs / renku storage pull try to commit files to git Renku (CLI)	3	406	15 September 2021
Stucked at step 2 of 2 when starting session	5	238	17 May 2023

Renku freezes when a large number of files are created

Related topics