Renku migrate hangs

I am trying to migrate a repo to a new version of Renku, but it hangs at the second step:

Applying migration m_0005__1_pyld2…
Applying migration m_0005__2_cwl…

This happens for this repo:

The repo contains quite some data, and I think there is something wrong with the metadata, because it can’t create the knowledge graph as well.

Everyone who could help with this is on vacation until the new year. Sorry about that, but I will make a reminder to have someone get back to you.

We complete changed our metadata format that we use to store workflows, to enable us to start improving the way we use workflows, but also to improve performance. But since the old format and way of doing things is rather slow, that migration can take quite some time. If you check the number of .cwl files in the .renku/workflows folder, the time the migration takes is on the order of that number of files times 3 seconds per file (so 200 workflows takes around 10 minutes), though it doesn’t scale linearly and depends on how those workflows are related/connected to each other, how many files the use as inputs/outputs and how complicated the git history of the project is.

If you check out the repo locally and run renku migrate manually, you should be able to see it making commits for the workflows while it’s migrating (in a second terminal, for instance).

Just to make sure it’s actually hanging and not just taking a long time.

Okay, thanks! That sounds actually plausible… I have 9735 files there… so should be at least 8 hours (I gave up after 4 so far). Will try again. But I did it locally anyways, but there were no new commits after I stopped the migration. Should I be able to see that afterwards? Or how can I check if it is actually committing during the migration exactly?

You should see the commits during the migration, due to the way things work it has to make a new commit for each workflow file as it is processed, to not break things.

On a side note, we are working on a way to store all metadata in a single file so all the walking the commit history is not necessary anymore, which speeds things up a lot. This is already done as an experimental feature but will hopefully be fully released in the next 3 months.

Okay, but I just saw this:

and no new commits afterwards…

Oh I think we misunderstood each other. I meant, run the migration in one terminal and then go to the project folder in a second terminal and do a git log, there you should see it creating commits.

Yes, but so I see no new commits when I do that unfortunately.

That’s weird. I’ll take a look at the project after my vacation and let you know if I find anything.

I didn’t have problems before, but when I try to run a renku command now I get an error, guess because of the interrupted migration:

Error: Project version is outdated and a migration is required.
Run renku migrate command to fix the issue.

Hi Remko,

This probably happens because of a renku update in your environment or local machine. Newer versions of Renku won’t work with a repository that was created/modified with an older version due to changes in metadata. That’s why a migration is needed to bring the metadata up to state.

You either have to migrate the project or install a version of renku that would work with your repo. From the above messages I believe that your project’s metadata version is 4 which would work with Renku v0.10.4. You can install this version locally by running pip install renku==v0.10.4.

Note that if the migration was interrupted and there are migration commits (which does not seem to be the case) then you must either finish the migration or git reset --hard to the last commit before migration.

Yes, thanks! I would prefer to use a new version of Renku, but so the migration still hangs at the second step.

I ran renku migrate now for a long time, strace showed it was actually doing something. Eventually, I received this error:

Applying migration m_0005__1_pyld2...
Applying migration m_0005__2_cwl...
You are using renku version 0.12.2, however version 0.12.3 is available.
You should consider upgrading ...
Error: Couldn't execute migration

Traceback (most recent call last):
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/core/management/migrate.py", line 128, in migrate
    module.migrate(client)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/core/management/migrations/m_0005__2_cwl.py", line 48, in migrate
    _migrate_old_workflows(client)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/core/management/migrations/m_0005__2_cwl.py", line 87, in _migrate_old_workflows
    path = _migrate_cwl(client, cwl_file, commit)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/core/management/migrations/m_0005__2_cwl.py", line 107, in _migrate_cwl
    _, path = _migrate_single_step(client, workflow, path, commit=commit, persist=True)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/core/management/migrations/m_0005__2_cwl.py", line 206, in _migrate_single_step
    matched_input = next(i for i in inputs if i.id == name)
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/cli/exception_handler.py", line 87, in main
    return super().main(*args, **kwargs)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/cli/migrate.py", line 61, in migrate
    skip_template_update=True, skip_docker_update=True, progress_callback=click.secho,
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/core/commands/client.py", line 90, in new_func
    raise result.error
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/core/incubation/command.py", line 128, in execute
    output = context["click_context"].invoke(self._operation, context["client"], *args, **kwargs)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/core/commands/migrate.py", line 78, in migrate_project
    progress_callback=progress_callback,
  File "/home/remko/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/core/management/migrate.py", line 130, in migrate
    raise MigrationError("Couldn't execute migration") from e
renku.core.errors.MigrationError: Couldn't execute migration

We’ve made a new Renku CLI release, v0.13.0, that fixes migration errors you had in this project. It also shows migration’s progress. You need to hard reset the project to the last commit before migration and remove any extra files that is there (be careful with these commands as they cannot be undone):

git reset --hard <commit-sha>
git clean -dfx

Please upgrade to this version and try migrating again.

Thanks! It finally worked, but it took a really long-time… It had to run for 11 days (!) before it finished. Now I just have 9736 new commits, and when I try to push, it says:
Connection to renkulab.io closed by remote host.

Is that just my internet connection, or do I have too many commits or so?

@rcnijzink this looks like an issue with having too many commits and a connection timeout.
May I ask if you are using ssh or https for the push? (one way to check this is with git remote show origin)
If the origin is an https link, could you try using ssh instead?

Yes, I think it is something like that. I am using ssh here, is there a way I can change the timeout time?

We could try modifying your ssh settings in the ~/.ssh/config file to keep the connection alive, as described here.
The following will send a null packet to the server every 60 seconds (ServerAliveInterval) for 30 times (ServerAliveCountMax) leaving the connection be alive for 30 minutes in total. If that’s not enough you can increase ServerAliveCountMax.

Host *
  ServerAliveInterval 60
  ServerAliveCountMax 30

You could also prepend this GIT_SSH_COMMAND="ssh -vvv" to your git push command to get a more verbose output.

Thanks! Unfortunately it still doesn’t push, but it also seems as if the connections is not longer than 60 seconds alive. This is the last part of GIT_SSH_COMMAND="ssh -vvv" git push :

debug3: channel 0: status: The following connections are open:
  #0 client-session (t4 r0 i0/0 o0/0 fd 5/6 cc -1)

debug1: fd 0 clearing O_NONBLOCK
debug1: fd 1 clearing O_NONBLOCK
Connection to renkulab.io closed by remote host.
Transferred: sent 4020, received 3564 bytes, in 60.2 seconds
Bytes per second: sent 66.7, received 59.2
debug1: Exit status -1
Locking support detected on remote "origin". Consider enabling it with:
  $ git config lfs.https://renkulab.io/remko.nijzink/vomcases.git/info/lfs.locksverify true

I managed, but maybe not in the most elegant way…

git log -9736 --pretty=oneline > ../log

Then I removed all the commit messaged so it was just a file with the commit number on each line.

Then reverted:

tac ../log > ../log_reverted

And had a simple script to push each commit:

for commit in $(cat ../log_reverted)
do

git push origin $commit:master

done