I would like to set up a custom Dockerfile with a specified version of python (e.g. 3.8) and run it locally on my machine for testing or offline use. @tolevski kindly pointed me to a standard image I could use here: Trouble with migrating to renku 2.0 and external gitlab - #14 by tolevski , but I realised that I don’t know how to create a new Dockerfile to use this image, how to modify it to use a specified version of python, and how to run it on my local machine. Are there any instructions for this in renku 2.0?
Hi @schymans, if you have not worked with Docker or Dockerfiles before it will take some time to get used to this. One of the things we try to do with Renku is so that our users do not have to worry about this. Because of this we don’t really have an introduction to Docker on our documentation page. But if you still want to do this, then I recommend watching or reading some tutorials on Docker before jumping in.
What I can also recommend is to use the new “build from code feature” to build your image. Here is the documentation about it: Create an environment with custom packages installed | Renku
So what you need to do is define your python environment with poetry, conda or pip. In the case of poetry and conda you can fix your python version. So I recommend using one of those two options. Then Renku will build and publish the image for you. Once that is done you will be able to see the name of the built image and use it. And you can use it locally.
Once you have an image and you want to use it locally, you still have quite a bit of work to do. That is if you want to replicate the Renku experience locally. This is what Renku does for you that you would have to replicate:
- clones git repos
- sets up credentials for the repos
- mounts data connectors
- injects any saved secrets
Here is what you can do to get close:
- Repos: clone locally outside of the image and mount them inside your docker container
- Repo credentials: you could mount your ssh keys inside your docker container too
- Data connectors: this is complicated to set up, you could mount the data connectors locally but you would have to setup your docker daemon to propagate the mounts. This may not work out of the box. For now I recommend you skip this.
- Saved secrets: This is pretty much not possible - because the saved secrets are only accessible from inside the Renku cluster. This is on purpose for security reasons.
Another thing to note. If you use the “build from code” functionality, then you either have to use the default entrypoint which will run vscode or jupyterlab similar to what happens in the website. But if you want to just get a shell into the container then you need to run /cnb/lifecycle/launcher as the entrypoint, followed by for example bash or whatever you wish to execute. We build our images with cloudnative buildpacks and that launcher sets up all the required paths and environment variables for things to work. Without it you will not be able to find your python environment for example.
There are also other thing to note:
- Port forwarding when running the container, this is needed if you want to access the same jupyterlab or vscode environment in your browser, you dont need it if you just want a shell
- Managing the state - if you delete the container then things you have done in the container may not be saved. Here is where mounting folders from your local machine into the container becomes useful.
- The image that is built from the “build from code functionality” in Renku is not kept forever. So you may have to just go to the Renku website and trigger a rebuild if the image cannot be found.
- The “build from code” function will not automatically rebuild when you change your python packages. You have to manually trigger rebuilds when things change and also restart your container with a new image.
Hope this helps. Let me know if you have any questions.
Here are some example commands to get your started:
This will run the default entrypoint and port-forward port 8000 from your container to port 8000 on the host machine. You need this if you want to access the container from your browser
docker run -ti -p 8000:8000 <image_name>
Here is how you can get a shell into the container:
docker run -ti --entrypoint /cnb/lifecycle/launcher <image_name> bash
Thanks a lot, @tolevski! I hadn’t thought of the data connectors and saved secrets. I haven’t actually explored yet how the data connectors work, I am still keeping copies of the data in git (using git-lfs, added as “datasets” in renku-python). Most of my repos also have the old Dockerfile, an environment.yml and a requirements.txt file, however the Dockerfile was created by the old renku-python during `renku init` and if I run `docker build -t mycontainer .` I get an error.
I was thinking of a workflow as described in Data science with JupyterLab | Docker Docs
docker run --rm -p 8889:8888 -v jupyter-data:/home/jovyan/work quay.io/jupyter/base-notebook start-notebook.py --NotebookApp.token='my-token'
If I then open the browser at localhost:8889/lab?token=my-token, I can work with my jupyter notebooks and the edits are saved in the local folder on my machine. I can then commit changes using git. I wondered if instead of Quay, I could use a notebook from renkulab.
@schymans can you post an example of what one of your sets of environment.yml + requirements.txt look like? I can setup an example Renku project to build an environment from these with Jupyterlab in them. Then you can use that locally. Or in a renku project.
And if you connect your repos to the same project where I will add the image building steps then you will have a working version of your project from Renku v1. What do you think about this? And this circles back to what I originally proposed above - you can take the image built by Renku and use it locally.
We have pre-set images with jupyterlab. They just may not have all the python packages you need. You may get lucky and we may have all the packages you may need in one of our pre-set environments or you may not.
You can try with ghcr.io/swissdatasciencecenter/renku/py-datascience-jupyterlab:2.15.0.
Awesome, thanks! If I wanted to use Package renku/py-datascience-jupyterlab · GitHub , would I need something like the start-notebook.py in the quay.io example?
My requirements.txt is usually quite simple, e.g.
matplotlib
pandas
numpy
sympy==1.3
essm==0.4.3
scipy
I didn’t add anything to the environment.yml, but I guess this would be where I could define the python version? Essm currently only works with sympy 1.3 and therefore it all only works up to python 3.8. This will hopefully be updated in the near future.
If you could set up an example Renku project to build an environment that could be used to make former Renku v1 projects work in Renku 2.0 and offline on the local computer, that would be awesome! But would it not suffice to create a Dockerfile that could be used to replace the original Dockerfile in Renku v1 projects and then build a custom environment based on the existing requirements.txt?
So see this example project I just made: Reproducible Data Science · Open Research · Renku
I used the following environment.yml:
name: base
channels:
- nodefaults
- conda-forge
dependencies:
- python=3.8
- pandas
- numpy
- scipy
- matplotlib
- pip
- pip:
- sympy==1.3
- essm==0.4.3
This is where you can find the file: renku-test-environments/stan-example at main · olevski/renku-test-environments · GitHub
If you want to run that session locally you just need to do:
docker run -ti -p 8000:8000 harbor.renkulab.io/renku-build/renku-build:renku-01kkfdqkxd0ht0g0mdypxjm07c
Then just go to http://localhost:8000/lab
You can see the name of the image by clicking on the launcher and looking in the details in the sidebar.
If you use this please copy the setup/files/config into your own repository and make your own Renku project. I usually purge all my Renku projects once in a while because I accumulate too many for examples and troubleshooting.
This would also work. But it would take you more time to do it. Especially if you don’t know much about Docker and you have to learn.
My example above is much more sustainable and quicker to implement.
Thanks, I am able to run and access the environment as shown, but I don’t know how to make existing notebooks accessible inside the container. All I see is the environment.yml in /source/stan-example. Could you give me a tip how I could access an existing folder on my system? Sorry about my very limited understanding of how to run docker containers…
Hi @schymans , yes. The -v will make a path on your own machine be mounted and accessible in the container, so you can do something like this:
docker run -ti -p 8000:8000 -v /absolute/path/on/host:/path/inside/container harbor.renkulab.io/renku-build/renku-build:renku-01kkfdqkxd0ht0g0mdypxjm07c
Note that the path on the host has to be absolute.
Ah from the docs I linked above I see this:
As of Docker Engine version 23, you can use relative paths on the host.
So if you are on that version of docker or newer then you can use relative paths too.
This is great, thank you! So I ran this command:
docker run -ti -p 8000:8000 -v ./:/workspace/host/ ``harbor.renkulab.io/renku-build/renku-build:renku-01kkfdqkxd0ht0g0mdypxjm07c
Then was able access a jupyter lab instance at http://127.0.0.1:8000/lab, where I see two folders, `source` and `host`. The `source` contains `stan-example` with the environment.yml, and the `host` the local folder from which I executed the docker command. Changes to any files in the `host` folder are saved in the local folder. This is exactly what I needed, now I just need to be able to add packages (see my next comment).
Could you explain a bit more how I could create a docker image myself? I will need to adjust the dependencies for each project, and for one particular project, I will also need to install R-packages. Incidentally, in the README.md I created for that project, I wrote instructions on how to work with the project locally using the dockerfile, following Running RenkuLab Interactive Sessions on Your Own Machine — Renku documentation . Unfortunately, this does not work any more.
Hi @schymans , Maybe this example may help you on how to create a Docker image that installs R-dependencies that you can leverage in your Renku project: GitHub - bethcg/R_bioconductor_NCCR_microbiomes: Docker for NCCR Microbiomes project with R environment including bioconductor · GitHub Does this help?
@schymans in the example that Tasko gave you the image is built by Renku, satisfying the dependencies in the project. Is there a reason why this doesn’t work for you? Building and maintaining your own docker images is difficult, which is why we offer this functionality that greatly simplifies the process! You can have Renku build the image for you and then use it locally if you need.
Thanks, @tolevski, @elisabetc and @rrrrrok! I still have renku v1 dockerfiles in most of my repos, so the environments are not built successfully. Your example, @tolevski, has an environment.yml file and no Dockerfile, whereas yours, @elisabetc a Dockerfile but no yaml file. How should I proceed, remove the Dockerfile and just use a yaml-file along the lines of what you created, @tolevski? If I use a Dockerfile to install R-packages, how would I need to modify it to read the yaml-file? And lastly, when I try to build an environment from code, I get: “No publicly accessible code repositories found in this project. RenkuLab can only build session environments from public code repositories.” Therefore, I am currently exploring the possibility to build an image locally, push it to docker and then pull it into renku from there.
Hi @schymans - the easiest is to not use the Dockerfile and just keep either environment.yml or requirements.txt (but not both).
In the next release, we will support similar image builds for R projects using Renv.
We are also working on allowing image builds from private repos - this should be available in a few weeks. Until then, you can make a separate public project with just the environment spec inside.
Hope that helps!
Best,
Rok
Thanks @rrrrrok ! This means that I could create the environment in a public project, build it, and then pull it into any other project on renku? That would be cool! In the current project I am trying to get up and running again, I use an r-package inside a jupyter notebook running a Python 3.8 kernel. Is there a way, or will there be a way to have both Python and R packages installed?
Sorry, I meant just make a public repo with the software requirements, and add that repo to your Renku project. Then it will show up in the options to build an environment from code.
If you want R and python, could you simply install both with conda? My understanding is that this is relatively well supported. Our automatic R builds will only support R initially, afaik but there may be a way to satisfy both.
Thanks, @rrrrrok! Getting there…
I created an image, and ran it locally while mounting my local folder as:
docker run -ti -p 8000:8000 -v ./:/workspace/host/ ``harbor.renkulab.io/renku-build/renku-build:renku-01kmx43sm0fjx64xsqhkbvqkd9
Unfortunately, whenever I execute any git commands inside the container, I get:
fatal: detected dubious ownership in repository at '/workspace/host'
To add an exception for this directory, call:
git config --global --add safe.directory /workspace/host
The workarounds I found is to include the above-mentioned git command in the Dockerfile, but since I cannot access the Dockerfile, I can’t. How does renku build the images, could this be reproduced locally as well?