[how-to] build on installation change only and "auto pin" image

Hi all

I was half way through writing this up as a request when I got the idea that this should be possible even without changes to renku:

Wouldn’t it be nice to only build a new image if the changes actually require it? The situation with pinned images is already a lot better from last year where every change by everyone caused a new build. But it’s still annoying to first build an image, and then go back to pin it and disable the build again.

A bit of trickery with the CI file to only build on changes to install files, an additional “latest” tag to the resulting image, and pinning that same latest tag actually makes this possible:

.gitlab-ci.yml:

image_build:
  stage: build
  image: docker:stable
  except:
      - /^renku/autosave.*$/
  only:
    changes:
      - Dockerfile
      - install.R
      - environment.yml
      - requirements.txt
      - .gitlab-ci.yml
  before_script:
    - docker login -u gitlab-ci-token -p $CI_JOB_TOKEN http://$CI_REGISTRY
  script: |
    CI_COMMIT_SHA_7=$(echo $CI_COMMIT_SHA | cut -c1-7)
    docker build --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA_7 --tag $CI_REGISTRY_IMAGE:latest .
    docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA_7
    docker push $CI_REGISTRY_IMAGE:latest

.renku/renku.ini (you can find the path in your gitlab container registry, replace the 6 letter random tag with “latest”):

image = image = registry.renkulab.io/your/project:latest

And there you ave it; automatic builds on relevant changes only and automatic uptake/pinning in new sessions.

Disclaimers: Yes I’m aware that this goes slightly against the idea of having everything explicit for reproducibility. Though it is possible to add explicit version tags as well.

Caveat 1:The renku webinterface currently doesn’t support overwriting the pinned image and I’m not sure how the git commit to image relation is maintained by renku for going back to older commits. As such you’ll have problems if you remove things from the image and try to go back to a commit that actually requires the removed stuff. (remember the latest tag from the pinned images in the renku.ini will always point to the latest image in the registry, not the one it was back then)

Caveat 1.5: In case a project gets forked the pinned image will point to the latest image in the parent repository which might change without notice in the forked projects. The flip side of that coin is that it also allows to push fixes in the image to the forked projects (e.g. the latest issue with the renku release and project paths in older base images)

Caveat 2: Of course if your project requires compilation at image build time you don’t want to use that “only: changes:” structure and with it the latest tag becomes a bit superfluous as renku already picks the latest corresponding image.

Caveat 3: I’m not exactly sure how this will interact with the autosaving feature ¯\_(ツ)_/¯
edit:I just tried it. new branches count as changed files and a new image is created, with the latest tag. I’d say that is not recommended. I added an exception clause to exclude builds on autosave branches…

Caveat 1 has implications for reproducibility (it’s slightly harder to record which image was used to produce a given result), but it can also make life a lot easier during development times with a lot of changes and when supporting things like exercise for a university course.

cheers

2 Likes

Thanks for this suggestion @a_user ! We’ve thought about using this feature of CI before, as you rightfully point out it makes a lot of sense - ideally we would be able to actually keep the tagging as it is (without resorting to latest) and figure out from the git history which commit is the last one that changed the relevant files (imho relying on latest will eventually lead to lots of confusion at best). But, every time we started going in that direction we ran against more cases where it would make things really difficult for the regular user to understand which image to use at which point in the history of their project. So we decided that the tradeoffs weren’t worth it (for the time being).

I think if you’re willing to live with the caveats (i.e. you might not really be able to recover easily an image that you need for a particular point in the history of your project) then your approach makes sense.

Our main concern was usually that users wanting to spin up a version of their project locally would not have the necessary knowledge to figure out the correct tags from the git history. But maybe if we provided a CLI command that would tell you the correct image tag, this would be solved rather easily. The same logic would be needed by the service that launches the hosted sessions - to get the git history needed to determine the correct image tag, you would need to do a clone of the project on the server, which could take a while depending on the project size.

We’ve had an issue open on this topic for a very long time - please feel free to contribute ideas there as well: Build Docker image only when needed · Issue #646 · SwissDataScienceCenter/renku · GitHub - I would be really happy if we came to a solution that would avoid unnecessary image builds!

Yeah, I agree on this becoming annoyingly complex fast. Currently we are interested in exactly one point in time when we release the repo to our students and don’t care about anything before (or after, going by how the migration from last year to this year went :wink: )… well, maybe the short time after the release if we have to update the base install.

An option might be to use git notes? It doesn’t look like gitlab wants to support git-notes for annotating commits. But they do offer the api for accessing commit comments. Not very elegant, but it might be an option to build a renkulab bot that comments the image to be used for every commit. Though admittedly I’m not sure if the build environment has access to that info. Retrieval of the image can be done per commit with the api.

It does get tricky very quickly! I hadn’t thought of using git notes for this - we discussed them briefly very early on but I think we decided that it was a feature without wide adoption so it’s best to not go there (as also evidenced by lack of support in the gitlab api for it).

The commit comments are not something that gets added to the repo in any way I don’t think, it only lives in gitlab. So also not a great solution (we want projects to ideally be transferable).

The easiest solution that wouldn’t cause an enormous user-facing change might be what was suggested in the issue I linked - to apply tags directly to the registry. That is essentially what we are doing now, but we are lazy and just use the build to do it. If we could recognize directly that no changes need to be made and just apply a new tag I think it would be fine. We should probably give that line of reasoning a second look. Of course, if it’s something you’re excited about and want to give it a shot, we’d be thrilled to review a PR that implemented an improvement to the current state, even if it’s just a draft or a proof-of-concept!