If you have ever experienced a long-running training process die because of a Renku timeout, this guide is for you!
While Renku sessions stay active during CPU usage, the front-end application (JupyterLab/VSCode) may still stop long-running processes. To prevent this, we recommend using tmux, which is installed by default in all Renku global environments and code-based environments.
Workflow
- Open your session terminal in Renku.
- Start a new tmux session:
tmux - Run your script: e.g.
python train.py
You can now safely close your browser window. Your model will keep training in the background until completion.
Full documentation: Handle long training runs