Tensorflow 2.9.0

Hi
I was wondering if it was possible to create a Tensorflow 2.9.0 image. I encountered some bugs when running mixed precision together with TF 2.7.0, they got fixed in TF 2.9.0

Thanks for any feedback.
Raphaela

Hi @rwagner I just submitted a PR to add this image here: chore: add tensorflow 2.9 image by olevski · Pull Request #248 · SwissDataScienceCenter/renkulab-docker · GitHub

The image resulting from this PR and which has tensorflow 2.9 is renku/renkulab-cuda-tf:11.2-tf-2.9-3c93878 can you try this image and comment directly in the PR if it works or not?

If it is easier you can report whether the new image works or not here too rather than in the PR.

Hi there

Thanks a lot. I updated my Dockerfile with the new image and pushed it. The pipeline passed, so it’s looking good. However, if I want to start a new instance with my latest commit (where the new Dockerfile is specified), I’ll get an error. See below.

The logs are not yet available, so I need to wait.
Not sure if this is related to the TF2.9. image though…

@rwagner there are not enough GPUs on the deployment you are using and your session cannot be scheduled because of this.

So you can either launch a session without requesting a gpu or shut down an existing session and then launch a new session with the gpu that frees up from your previous session.

And remember to save and push your work if you shut down your running session.

@tolevski Yes, you were right. Spinning up a 1 CPU, 0 GPU instance it worked.
Thanks.

@rwagner if you end up running the image with a GPU then let me know here if it works or not. Because sometimes the session will start and everything will seem fine but as soon as you try to use a GPU from within tensorflow you will get errors. This can happen if cuda or the underlying libraries are misconfigured or have version conflicts. I dont think this is likely here but it would be nice to be sure.

Hey, not sure if there’s an issue with the TF 2.9 image in combination with the Debugger V2.

I ran debugger V2 as explained in the following guide: Debugging Numerical Issues in TensorFlow Programs Using TensorBoard Debugger V2

The files get successfully created inside the specified directory. See screenshot.

However if I start tensorboard and choose Debugger V2 it says that there’s no data to display and the little arrow on the top right keeps spinning forever.
If I downgrade TF and TB to 2.7.0 I get the data displayed.

Just FYI