Hi @rrrrrok, I can install the conda environment locally with mamba, and quite fast. Nevertheless, the two repos I see it are very much different. Also, the CI/CD stopped working despite no changes in environment or Dockerfile or requirements.txt. So it must be some update on the side of the runners.
I am wondering if RAM / disk /etc specs of the CI/CD got downgraded? Or if there is something that is cached that is causing some issue?
See where it always gets stuck, but not fail, just gracefully timeout when the max time is reached.
Hi @rrrrrok, I am experiencing the same. the CI/CD fails because taking too long (limit is now 1h, but much less is needed normally), and it get stuck at the mamba env update. Same as for @firat
Any update on this? Locally on a VM all works (conda, mamba is not installed on it).
Sorry about that, we’re looking into it. As a workaround in the meantime: if you can build the image locally, you could push that to the registry and it will be picked up when you try to launch a session.
thanks for the workaround. Could you give us pointers if there is some in renku docs (e.g., if there’s an example) for where we should be pushing the images to get them recognized?
At the moment, I switched to merging pull requests without waiting build to succeed.
The information here is a little bit outdated wrt launching the sessions locally, but all the bits about the registry, the docker login and the image naming should be correct.
Hey @firat@lin - I’ve been looking into this for an hour or so; it seems there is some issue with the conda environments generally which is causing the conda env update to get stuck. I’ve tried a couple of basic things to no avail as yet - I’ll look into it a bit more tomorrow…
Oh wow, you are right the first project seems to be outdated. But second one (private project) is actually using mamba.
Let me try to replace Dockerfile with a newer Renku dockerfile setup.
@rrrrrok@seanrmurphy Just some more details in case: I am using mamba, and the project was building 4 days prior in few minutes, with no issues at all. The conda environment was consistent with no conflicts, as it was re-built from scratch after important packages had to be updated (such as torch). No change occurred on the environment.yaml nor on the Dockerfile, it just started getting stuck and the mamba update at one commit pushing only changes to the documentation. Not sure if this helps, but it seems that this is not something caused by the project itself (or not directly and in obvious manners at least, and the log from the image build job does not show anything sketchy). Let me know if you want to look at the project, I can add you! Cheers and thanks for looking into this!
@rrrrrok The same issue with mamba. My situation is pretty similar to @mitch. One package added to environment.yaml (for sure no conflicts, local build is also fine), and no change on the Dockerfile. All previous CI/CD runs worked fine.
Hi folks - I have been looking into this a little this morning - I can confirm that I can build the project of @mitch (mzb-workflow) on a vm and almost certainly on my local machine. My current thinking is that there could be issues with our access to anaconda inside the gitlab runners (similar issues have been reported last week) - I could of course be wrong on this.
The VM build did take time (20m) but it was a build from scratch - this is still significantly less than the 1h time limit we have as a default on the job in the pipeline so I don’t think we are arbitrarily hitting the timeout; also, I see @firat made the timeout 10h and still the problem manifested.