Problem with CUDA initialization

jules · 3 November 2020 13:55

Hi,

I’m using renku on a server of my university that have GPU access. When initializing a renku renkulab with one GPU, running:

    import torch
    torch.cuda.is_available()

prints False and gives this error:

/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)

I think that may be because I have to install the libgl1-mesa-glx package in the Dockerfile. Otherwise importing torch gives this error:

ImportError                               Traceback (most recent call last)
<ipython-input-1-4257a2495eec> in <module>
      4 import json
      5 import numpy as np
----> 6 import cv2
      7 import torch
      8 import torch.nn.functional as F

/opt/conda/lib/python3.7/site-packages/cv2/__init__.py in <module>
      3 import sys
      4
----> 5 from .cv2 import *
      6 from .data import *
      7

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

I found this workaround here: https://github.com/conda-forge/pygridgen-feedstock/issues/10

Do you have an idea why this happens?

Dockerfile:

ARG RENKU_BASE_IMAGE=renku/renkulab-py:3.7-0.7.1
FROM ${RENKU_BASE_IMAGE}

ARG RENKU_VERSION=0.11.6

# Uncomment and adapt if code is to be included in the image
# COPY src /code/src

# Uncomment and adapt if your R or python packages require extra linux (ubuntu) software
# e.g. the following installs apt-utils and vim; each pkg on its own line, all lines
# except for the last end with backslash '\' to continue the RUN line
#
#USER root
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    apt-utils \
    vim \
    libgl1-mesa-glx \
    nvidia-cuda-toolkit
USER ${NB_USER}

# install the python dependencies
COPY requirements.txt environment.yml /tmp/
RUN conda env update -q -f /tmp/environment.yml && \
    /opt/conda/bin/pip install -r /tmp/requirements.txt && \
    conda clean -y --all && \
    conda env export -n "root"

########################################################
# Do not edit this section and do not add anything below
RUN pipx install --force renku==${RENKU_VERSION}
########################################################

requirements.txt:

numpy
sklearn
torch
opencv-python
matplotlib

Thank you,
Jules Gottraux

rrrrrok · 3 November 2020 14:04

Hi @jules thanks for posting the question here! We build a separate image with cuda installed - it’s a bit old at this point and we should have a newer one for cuda 10.2 soon, but could you try using this one as the base image in your Dockerfile?

renku/renkulab-cuda10.0-tf1.14:0.7.2

amirrezaie1415 · 5 November 2020 04:18

Hi @rrrrrok. I am having almost the same problem. I do not get any error, but the results of

torch.cuda.is_available()

is False.
I quickly searched for RENKU projects in which they use PyTorch + GPU, but I could not find any. That would be great if SDSC can provide us with an example of a DockerFile to use GPU+PyTroch. As I imagine many people use PyTorch to create their deep model. Thanks!

rrrrrok · 5 November 2020 13:02

Hi @amirrezaie1415, I just tried with the renku/renkulab-cuda10.0-tf1.14:0.7.2 image and it works fine. Here’s the full Dockerfile I used:

# For finding latest versions of the base image see
# https://github.com/SwissDataScienceCenter/renkulab-docker
ARG RENKU_BASE_IMAGE=renku/renkulab-cuda10.0-tf1.14:0.7.2
FROM ${RENKU_BASE_IMAGE}

# Uncomment and adapt if code is to be included in the image
# COPY src /code/src

# Uncomment and adapt if your R or python packages require extra linux (ubuntu) software
# e.g. the following installs apt-utils and vim; each pkg on its own line, all lines
# except for the last end with backslash '\' to continue the RUN line
#
# USER root
# RUN apt-get update && \
#    apt-get install -y --no-install-recommends \
#    apt-utils \
#    vim
# USER ${NB_USER}

# install the python dependencies
COPY requirements.txt environment.yml /tmp/
RUN conda env update -q -f /tmp/environment.yml && \
    /opt/conda/bin/pip install -r /tmp/requirements.txt && \
    conda clean -y --all && \
    conda env export -n "root"

# RENKU_VERSION determines the version of the renku CLI
# that will be used in this image. To find the latest version,
# visit https://pypi.org/project/renku/#history.
ARG RENKU_VERSION=0.11.6

########################################################
# Do not edit this section and do not add anything below
RUN pipx install --force renku==${RENKU_VERSION}
########################################################

Let me know if that works for you!

amirrezaie1415 · 5 November 2020 13:36

@rrrrrok Perfect, many thanks. It worked for me.

jules · 5 November 2020 14:36

Hi,

Thanks for the responses. It works now.

rrrrrok · 5 November 2020 14:37

Great, glad it was that easy!

Topic		Replies	Views
No NVIDIA driver on the system found	10	362	7 April 2021
Using Renku Cuda images with Python 3.11	2	23	22 August 2024
Renku session on own machine with GPUs Renku (CLI)	7	225	11 April 2023
What is the GPU option RenkuLab	4	801	10 June 2022
ImportError in new repo RenkuLab	4	1061	4 July 2022

Problem with CUDA initialization

Related topics