HI @droqueiro, apologies for replying late.
Typically a Slurm cluster and a kubernetes cluster will serve two very different purposes, though there might in some cases be some overlap. A kubernetes cluster would typically be used to orchestrate long-running server processes, like your django web app together with other services, i.e. an authentication service, a certificate service etc. I’m not aware of a configuration where a Slurm cluster and a kubernetes cluster would share the same resources, so typically these would need to be separated, meaning that you would need to provision your own kubernetes cluster alongside the slurm cluster if you wanted to have a stand-alone Renku deployment together with your current setup.
This is a super interesting use-case, though it deviates a bit from a “vanilla” way that a Renku workflow might be used. If I’m understanding correctly, you would like to do something like this:
- a (parametrized) workflow using the renku CLI is developed in a renku project - this will consist of several steps, some being embarrassingly parallel when executed in batch mode.
- A user submits data via the web-app in the specified format such that it can be consumed by the workflow
- the workflow is shipped to the slurm cluster and executed - several things need to happen here:
a. the data is copied to the HPC cluster
b. the “gold-standard” repository with the workflow is cloned in a temporary space dedicated to this specific workflow execution
c. the user-provided data is linked into the repository so that it may be tracked self-consistently with the workflow
d. the pre-compiled workflow is re-executed with the user-provided data using a Slurm-aware workflow engine that can analyze the workflow DAG and execute some steps in parallel
e. once the execution finishes, the results are collected by a process outside of control of Renku and sent back to the web-app (this could be done via a script triggered by a post-execution hook in Slurm)
f. the temporary clone of the repository where the execution ran is removed
This is one possibility - I’m assuming here that you (and the user) care only about the final result, meaning that the repository where the workflow was executed does not need to be preserved. In this case, the server-side of Renku is only used for storing the repository with the gold-standard workflow and the majority of Renku-related functionality is done by the command-line client. In my view, a stand-alone deployment of Renku would not really be necessary here, you could simply use renkulab.io to keep the reference repository (or repositories).
A potential downside of this approach is that the knowledge-graph information about these batch executions on user-provided data will be lost, though you could manually extract it before getting rid of the temporarily cloned repository. A benefit is that you could make the “gold-standard” repo publicly accessible so anyone could clone it and execute the workflows on their own infrastructure if they wanted to.
Let me know what you think! To support this use-case fully we would likely need to develop some additional pieces in the Renku command-line client so it’s good to keep the conversation going to clarify exactly what is needed.