Machine learning script very slow


I am trying to run a machine learning script on Renku. However, when I run it, it is 10 times slower than when I run it on a virtual machine, and I selected the highest settings (4 CPUs, 32G memory, 2 GPUs). I also selected this base image : renku/renkulab-cuda10.0-tf1.14:renku0.10.4-0.6.3 that should allow to use the GPU.
So, I am wondering if Renku can actually be used for machine learning ? And if so any idea of what I am doing wrong ?
Thank you.

1 Like

Hi @Emmabout:

Well, you don’t provide that many insights about what you are trying, so I cannot help you with that. But for the moment I can answer you the first question: yes, Renku limited is helping me a lot with all kind of DL simulations, as the GPU(s) we can find there perform remarkably, and had allowed me running many trainings on a reasonable amount of time.

Therefore, and here I am just guessing, the problems you are encountering might be caused by:

  • A package that is not correctly identifying the resources and/or is not moving the Tensors from the cpu to the gpu.
  • A DL model that is small, i.e. reduced number of parameters, and therefore in this case the GPU just introduces overheads that makes the performance worst than when using the CPU. But in any case, for these small models you should not be using the GPU.
  • Perhaps, your ML model is simply not parallelized, i.e., optimized for GPUs. And in that case, the lower frequencies of the GPU clock will provide a worst performance when compared to the performance of running the model on the CPU.

Therefore, please could you check the previous points, and give us more hints about the model you are running. We are glad to help :slight_smile:


1 Like