Failed image uploads 25 Nov. 2020

Hi All
Today around noon we made a lecture template project available to our students and subsequently experienced hours long delays when the docker images for the forked projects were build. Building the images was fine, but uploading didn’t work and led to a timeout after 1hour.

Sample job that timed out is 127367.

Though stopping and restarting jobs might not have been the best idea for you to debug this. I guess it’s possible that again everyone tried to fork our template project at the same time and caused a bit of a hold up. But it did persist roughly between 12:15 and at least 14:30 or so.

cheers

1 Like

Hello a-user, and thanks for reaching us.
We are aware of the current fragility of our registry, and we are working on improving it. Running 25-50 jobs at the same time shouldn’t be a disrupting event.

To prevent cases like this and speed up the work simultaneously, we are working on a solution to pin docker images.
Ideally, we want to build images only when necessary, and the system should be smart enough to understand when that is the case.

In the first iteration, a few manual steps will be required to specify a reference image to be used for every commit. The easiest solution will be to set up the project, then pin the image from an interactive environment so that all the following commits (forks included) will use that image and won’t trigger further builds.

This should be possible in the next (or next+1) release, hopefully in a few weeks. Appropriate documentation will be provided.
As a reference, this is the GitHub issue tracking the progress: https://github.com/SwissDataScienceCenter/renku/issues/1611

Hi Lorenzo

Thanks for the update, that looks nice and I’m looking forward to see it in action.

cheers