Environment dies before starting

I can no longer start an environment for a particular project on limited.
Recently, I added a large git LFS file (52GB), then I heard a collaborator pushed a file around size 229MB to without LFS.
I don’t know which of the two is causing this issue, but as is I have the following situation:

For any commit (including commits that should be prior to both files), I attempt to start an environment (Automatically fetch LFS data is unchecked). After waiting for a while, running environments tab gets empty without “connect” becoming available. So it crashes at some point, but I don’t know how to check logs.

I attempted to clone this repo to a local computer.
I get an error:

fatal: Out of memory, realloc failed
warning: Clone succeeded, but checkout failed.

A little googling suggests “git process on remote host seem to be hitting a memory limit”, which I don’t know how to increase for the GUI environment.

Thanks!
Firat


EDIT:
I can indeed clone the repo to a local computer, however not with git clone…

git lfs clone

works without issues, while

git clone

gives the same fatal: Out of memory, realloc failed error. Regardless, this seems to be a local machine issue.

Environment on GUI also launches now, thanks!

Am I understanding this correctly that you aren’t able to check it out locally either?

@rrrrrok that’s right,

GIT_TRACE=1 git checkout -f HEAD

returns

09:45:10.094340 git.c:344 trace: built-in: git ‘checkout’ ‘-f’ ‘HEAD’
09:45:10.098440 run-command.c:334 trace: run_command: ‘git-lfs smudge --skip – ‘'‘A_GIT_LFS_OBJECT’'’’
09:45:10.099275 run-command.c:193 trace: exec: ‘/bin/sh’ ‘-c’ ‘git-lfs smudge --skip – ‘'‘A_GIT_LFS_OBJECT’'’’ ‘git-lfs smudge --skip – ‘'‘A_GIT_LFS_OBJECT’'’’
09:45:10.111031 trace git-lfs: exec: git ‘version’
09:45:10.115933 trace git-lfs: exec: git ‘-c’ ‘filter.lfs.smudge=cat’ ‘-c’ ‘filter.lfs.clean=cat’ ‘-c’ ‘filter.lfs.process=’ ‘-c’ ‘filter.lfs.required=false’ ‘rev-parse’ ‘–git-dir’ ‘–show-toplevel’
09:45:10.119376 trace git-lfs: exec: git ‘config’ ‘-l’
09:45:10.122675 trace git-lfs: exec: git ‘rev-parse’ ‘–is-bare-repository’
09:45:10.124996 trace git-lfs: exec: git ‘config’ ‘-l’ ‘–blob’ ‘:.lfsconfig’
09:45:10.127369 trace git-lfs: exec: git ‘config’ ‘-l’ ‘–blob’ ‘HEAD:.lfsconfig’
09:45:10.130003 trace git-lfs: Install hook: pre-push, force=false, path=/local/git/repo/path/.git/hooks/pre-push, upgrading…
09:45:10.130543 trace git-lfs: Install hook: post-checkout, force=false, path=/local/git/repo/path/.git/hooks/post-checkout, upgrading…
09:45:10.130685 trace git-lfs: Install hook: post-commit, force=false, path=/local/git/repo/path/.git/hooks/post-commit, upgrading…
09:45:10.130802 trace git-lfs: Install hook: post-merge, force=false, path=/local/git/repo/path/.git/hooks/post-merge, upgrading…
09:45:10.131129 trace git-lfs: exec: git ‘-c’ ‘filter.lfs.smudge=cat’ ‘-c’ ‘filter.lfs.clean=cat’ ‘-c’ ‘filter.lfs.process=’ ‘-c’ ‘filter.lfs.required=false’ ‘rev-parse’ ‘HEAD’ ‘–symbolic-full-name’ ‘HEAD’
fatal: Out of memory, realloc failed

Firat

Looks like that might be related to an issue on the local machine - I can’t reproduce this problem using SSH to clone.

1 Like

A comment on starting environments on limited: there was a temporary issue with the infrastructure that prevented environments from this project to start properly yesterday. This has been fixed and environments from different branches of this project are able to start correctly now.

2 Likes

It seems again not possible anymore to start an environment for the particular project on limited.

And curiously when it was still possible the environment was always starting in a branch (Big Bang) not known to any of the collaborators in the project.

hi @malvin ,

indeed, there was an issue with the node where your environment was trying to start which has just been fixed. Thanks for the notice.
Could you please try again and let us know if the wrong branch is still appearing?

-Pamela

Hi @pameladelgado,

thank you for the quick fix.

Starting an environment is possible again, unfortunately the wrong branch (Big Bang) is still appearing when starting an environment in any of the branches known to the project.

hi @malvin

would it be ok if I add myself and a colleague as members of your project so that we can debug this easily?

From what I see it the git-clone failed to clone the repo, is there by any chance some big data in the repository that is directly in git and not in LFS?

-Pamela

hi @pameladelgado,

for me this would be okay, yes.

I think there was one bigger file (~229 MB) not in LFS by accident but not sure if this can be linked to the weird branch behavior?

Thank you for the support!

I think the problem comes from the large files that should have been in git LFS but were just regularly checked in. The nice thing is that the renku cli can help here.

There are two commands you will need:

  1. This will check for any large files that should be in LFS but are not in the whole repo’s history
~ renku storage check --all
Warning: Git history contains large files
	*.ckpt	2.0 GB
  1. The renku cli can also then move all these files to LFS throughout the whole repo’s history.
~ renku storage migrate --all
The following files will be moved to Git LFS:
	*.ckpt	2.0 GB [y/N]: y

I recommend trying this. I think it will resolve things. The problem is that even moderately sized files (i.e. 200MB) can sneak into the regular git history (out of LFS). And I think this can become a problem because they get replicated in different branches/commits/etc and to properly fix this you need to clean up the whole git history - not just the last place you saw them. Luckily the renku cli can do this for you.

I got access to you project (I also work on Renku with Pamela and Rok) and I could not find any alternative. I did not want to commit things to your repo so I did not actually run the 2nd migration step.

Also this should probably be done by every person who is using the repo. Because every local clone of the repo probably contains the large file that is mentioned above. So if person A cleans things up and person B does not then as soon as person B pushes a new or existing branch they will overwrite the LFS file with the non-LFS version.

Lastly since you cannot launch a session on renkulab you can just do this locally by cloning the repo, installing the latest version of the renku cli and running the commands. See here on how to install the renku cli locally.

p.s. An alternative to every repo user running the renku storage migrate --all is to have one user run it, then push the changes to the remote. Prior to this however everyone else should check in and push any work they wish to keep - including any branches. Because after the one person does the LFS migrate command everyone else should essentially abandon (i.e. delete their current repositories) and clone the cleaned up version.

However I think that alternative with everyone running the migrate command is less risky/error-prone.

hi @tolevski,

thank you for the support.

If I try to follow your steps by running the commands above in a terminal in an environment (connected to Big Bang branch) there comes a warning that I am not in a renku repository. Is this expected behavior?

Can I only resolve this by cloning the repository locally? (I was solely working in renkulab so far).

Yes what I proposed above can only be resolved by cloning the repository locally and running the commands locally. After you do the migration and push your changes you can continue to work solely on renkulab again.

hi @tolevski

a collaborator and me tried to follow your steps and do the migration, but were not successful doing so.

Is it possible that you run the migration and push the changes to remote? I checked with my collaborator and he has not any work that he wants to keep that is not pushed to remote.

Thank you.

Sure I can run the migration and push the code

1 Like

@malvin can you fork the project before I push the fix. Just in case the migration breaks things even more. It should not but better to be careful.

@tolevski sure I forked the repository. Should be fine to push the fix now.

Ok I had to run the command on every branch of your repo to get rid of stuff not in LFS. Things work now though. Give it a shot.

If someone has a local copy of the repository they should not push it back to the remote. This will rewrite the fix with files that are not in LFS.

1 Like

Great, thank you for the support, much appreciated.