Machine learning script very slow

Emmabout · 30 September 2020 07:07

Hi,

I am trying to run a machine learning script on Renku. However, when I run it, it is 10 times slower than when I run it on a virtual machine, and I selected the highest settings (4 CPUs, 32G memory, 2 GPUs). I also selected this base image : renku/renkulab-cuda10.0-tf1.14:renku0.10.4-0.6.3 that should allow to use the GPU.
So, I am wondering if Renku can actually be used for machine learning ? And if so any idea of what I am doing wrong ?
Thank you.

lusamino · 30 September 2020 11:26

Hi @Emmabout:

Well, you don’t provide that many insights about what you are trying, so I cannot help you with that. But for the moment I can answer you the first question: yes, Renku limited is helping me a lot with all kind of DL simulations, as the GPU(s) we can find there perform remarkably, and had allowed me running many trainings on a reasonable amount of time.

Therefore, and here I am just guessing, the problems you are encountering might be caused by:

A package that is not correctly identifying the resources and/or is not moving the Tensors from the cpu to the gpu.
A DL model that is small, i.e. reduced number of parameters, and therefore in this case the GPU just introduces overheads that makes the performance worst than when using the CPU. But in any case, for these small models you should not be using the GPU.
Perhaps, your ML model is simply not parallelized, i.e., optimized for GPUs. And in that case, the lower frequencies of the GPU clock will provide a worst performance when compared to the performance of running the model on the CPU.

Therefore, please could you check the previous points, and give us more hints about the model you are running. We are glad to help

Cheers
Luis

Topic		Replies	Views
Renku session on own machine with GPUs Renku (CLI)	7	234	11 April 2023
Renku Plugin: Benchmarking machine learning models with Renku-MLS Renku (CLI)	0	242	22 February 2022
Session requirement to run pretrained models RenkuLab	1	73	7 March 2024
Training pauses after several hours	9	230	8 May 2022
Sharing resources when running renku on an own machine	1	218	28 November 2022

Machine learning script very slow

Related topics