Renku update and module not found

I am trying to do a renku update, but then renku does not find the modules anymore. Everything works fine on my local machine, and the packages are installed, but looks that for renku update, renku searches somewhere else to import the modules:

renku update data/img_tech_note/3_uncertainty.png
Resolved '../../../../../../tmp/tmpe70w1gj8' to 'file:///tmp/tmpe70w1gj8'
[job a2f82274-6ce7-4f1f-8b63-006fe842f398] /tmp/azdu2o97$ python3 \
    /tmp/azdu2o97/src_py/plot_uncertainty.py \
    -i \
    /tmp/azdu2o97/data/CAMELS/basin_timeseries_v1p2_modelOutput_daymet.zip \
    -ia \
    /tmp/azdu2o97/data/CAMELS/camels_attributes_v2.0.zip \
    --figsize \
    17 \
    14 \
    --yloc_figlab \
    1.1 \
    --xloc_figlab \
    -00.1 \
    --dist2env \
    --ylim \
    0 \
    1 \
    -o \
    data/img_tech_note/3_uncertainty.png
Traceback (most recent call last):
  File "/tmp/azdu2o97/src_py/plot_uncertainty.py", line 3, in <module>
    from sympy import (diff, Eq, exp, init_printing, integrate, log, solve, nsolve, sqrt, Symbol, 
ModuleNotFoundError: No module named 'sympy'
[job a2f82274-6ce7-4f1f-8b63-006fe842f398] Job error:
("Error collecting output for parameter '_plans_9f8452d6465844348f1a3123b5254d5c_outputs_13': ../../../../../../tmp/tmpe70w1gj8:100:5: Did not find output file with glob pattern: '['data/img_tech_note/3_uncertainty.png']'.", {})
[job a2f82274-6ce7-4f1f-8b63-006fe842f398] completed permanentFail
Error: Unable to finish executing workflow

How can I make sure that renku update finds the modules as well?

Might be that you have renku installed in a different python environment than the one for your project. Do you use virtual environments of some kind?

Since you say everything works fine on your machine, does this happen in an interactive session?

So I am just working locally, and do not use any virtual environments, which is also why I don’t really get where it is coming from. When I do renku run with a script that uses the same module, there is no problem. Only with renku update it happens. Could it be that the cwl is looking for a different path?

That’s weird, but difficult to diagnose without seeing your setup.

renku update uses CWL to execute the workflow in a temporary directory (/tmp/azdu2o97$ python3 in your log output). It should inherit the environment you’re currently running in and pick the same python3 as when you execute a command manually in the project folder. But maybe there’s something weird happening with PATH or PYTHONPATH environment variables.

Did you install renku using pipx? pipx creates its own virtual environment that it installs renku into and maybe something there goes wrong. But CWL shouldn’t pick that up.

You could also try using the Toil backend instead of CWL as a workaround, maybe that works. Just install renku as renku[toil] (e.g. pip install renku[toil]) and then do renku update --provider toil data/img_tech_note/3_uncertainty.png.

This also can happen if you’ve installed renku or other packages for your user only (by passing --user to pip install). In that case, you can either install globally or set PYTHONPATH to point to the locally-installed packages. This works for me on macos (replace 3.7 with your python version):

PYTHONPATH=/Users/"$USER"/Library/Python/3.7/lib/python/site-packages:$PYTHONPATH
export PYTHONPATH

Okay, thanks! Setting the PYTHONPATH worked in the end.

However, now I run into another issue:

renku update data/img_tech_note/3_uncertainty.png
Resolved '../../../../../../tmp/tmpg01qe8dg' to 'file:///tmp/tmpg01qe8dg'
[job 7ce55e4e-dcdc-41d5-8376-7a331b6fb65f] /tmp/vb040_qa$ python3 \
    /tmp/vb040_qa/src_py/plot_uncertainty.py \
    -i \
    /tmp/vb040_qa/data/CAMELS/basin_timeseries_v1p2_modelOutput_daymet.zip \
    -ia \
    /tmp/vb040_qa/data/CAMELS/camels_attributes_v2.0.zip \
    --figsize \
    17 \
    14 \
    --yloc_figlab \
    1.1 \
    --xloc_figlab \
    -00.1 \
    --dist2env \
    --ylim \
    0 \
    1 \
    -o \
    data/img_tech_note/3_uncertainty.png
Matplotlib is building the font cache; this may take a moment.
Traceback (most recent call last):
  File "/tmp/vb040_qa/src_py/plot_uncertainty.py", line 359, in <module>
    main()
  File "/tmp/vb040_qa/src_py/plot_uncertainty.py", line 349, in main
    plt.savefig(args.outputfile, bbox_inches = "tight")
  File "/home/rnijzink/.local/lib/python3.8/site-packages/matplotlib/pyplot.py", line 859, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/matplotlib/figure.py", line 2311, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/matplotlib/backends/backend_qt5agg.py", line 81, in print_figure
    super().print_figure(*args, **kwargs)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2210, in print_figure
    result = print_method(
  File "/home/rnijzink/.local/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 1639, in wrapper
    return func(*args, **kwargs)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/matplotlib/backends/backend_agg.py", line 510, in print_png
    mpl.image.imsave(
  File "/home/rnijzink/.local/lib/python3.8/site-packages/matplotlib/image.py", line 1611, in imsave
    image.save(fname, **pil_kwargs)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/PIL/Image.py", line 2161, in save
    fp = builtins.open(filename, "w+b")
FileNotFoundError: [Errno 2] No such file or directory: 'data/img_tech_note/3_uncertainty.png'
[job 7ce55e4e-dcdc-41d5-8376-7a331b6fb65f] Max memory used: 72MiB
[job 7ce55e4e-dcdc-41d5-8376-7a331b6fb65f] Job error:
("Error collecting output for parameter '_plans_2d3d2401383e4279ba7f56e92614ac14_outputs_13': ../../../../../../tmp/tmpg01qe8dg:100:5: Did not find output file with glob pattern: '['data/img_tech_note/3_uncertainty.png']'.", {})
[job 7ce55e4e-dcdc-41d5-8376-7a331b6fb65f] completed permanentFail
Error: Unable to finish executing workflow

Traceback (most recent call last):
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/core/management/workflow/providers/cwltool.py", line 112, in workflow_execute
    outputs = process()
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/cwltool/factory.py", line 34, in __call__
    raise WorkflowStatus(out, status)
cwltool.factory.WorkflowStatus: Completed permanentFail

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/cli/exception_handler.py", line 92, in main
    return super().main(*args, **kwargs)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/cli/update.py", line 163, in update
    update_command()
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/core/management/command_builder/command.py", line 256, in execute
    hook(self, context, result, *args, **kwargs)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/core/management/command_builder/command.py", line 195, in _post_hook
    raise result.error
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/core/management/command_builder/command.py", line 242, in execute
    output = context["click_context"].invoke(self._operation, *args, **kwargs)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/inject/__init__.py", line 342, in injection_wrapper
    return sync_func(*args, **kwargs)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/core/commands/update.py", line 76, in _update
    execute_workflow(dag=graph.workflow_graph, command_name="update", provider=provider, config=config)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/inject/__init__.py", line 342, in injection_wrapper
    return sync_func(*args, **kwargs)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/core/commands/workflow.py", line 483, in execute_workflow
    execute(dag=dag, basedir=client.path, provider=provider, config=config)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/core/plugins/provider.py", line 85, in execute
    return executor(dag=dag, basedir=basedir, config=config)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/pluggy/_hooks.py", line 265, in __call__
    return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/pluggy/_manager.py", line 80, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/rnijzink/.local/lib/python3.8/site-packages/pluggy/_callers.py", line 60, in _multicall
    return outcome.get_result()
  File "/home/rnijzink/.local/lib/python3.8/site-packages/pluggy/_result.py", line 60, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/home/rnijzink/.local/lib/python3.8/site-packages/pluggy/_callers.py", line 39, in _multicall
    res = hook_impl.function(*args)
  File "/home/rnijzink/.local/pipx/venvs/renku/lib/python3.8/site-packages/renku/core/management/workflow/providers/cwltool.py", line 114, in workflow_execute
    raise WorkflowExecuteError() from e
renku.core.errors.WorkflowExecuteError: Unable to finish executing workflow

Is it using too much memory or so?

I think your script needs to create the parent folder data/img_tech_note/ (it tries to write to it but can’t find the folder).

Hm okay, but the folder exists already, and the old figure (that needs to be updated) is already there.

CWL runs in a new temporary directory and needs to copy over all relevant files. Renku should detect that it needs to copy/create the folder for the output to the temporary directory to then copy back the created file, but maybe something went wrong with that detection. Hard to tell from a distance.

In the log it says builtins.open(filename, "w+b") failed with the FileNotFoundError: [Errno 2] No such file or directory: 'data/img_tech_note/3_uncertainty.png' error and that usually means that the parent folder doesn’t exist.

Renku tries to follow the logic of “Did the directory exist before the renku run? Then it needs to be created in CWL before execution. Did the directory not exist before the renku run? Then the script probably creates it and it doesn’t need to be created by CWL”.

The above with creating the directory in your script would be a workaround. But if there’s a way I could take a look at the project that might help understand why it’s not working automatically.

I just gave you access to the project.

Something I did in this case, was like this:

  • renku dataset create img_tech_note
  • Then I just did manually mkdir data/img_tech_note, as it doesn’t exist yet.
  • And then I did the renku run command

Could that be a reason?

Manually doing mkdir shouldn’t be an issue. We expect users to work that way.

I can see in the metadata

        "create_folder": false,
        "default_value": "data/img_tech_note/3_uncertainty.png",

and create_folder should be true for it to work (That’s the flag that tells CWL if the script creates the parent folder or if it should be handled by CWL).

I think this is a bug, a quick look at the relevant source code shows one code-path where the create_folder flag isn’t set (and it defaults to false), so that’d explain it.

I’ve created an issue for this at CommandOutputs converted from CommandInputs don't set the `create_folder` flag correctly. · Issue #2777 · SwissDataScienceCenter/renku-python · GitHub and moved it into the current sprint.

Okay, super, thank you for the help!

You’e welcome, thank you for reporting this!

And until it’s fixed, doing something like

from pathlib import Path
Path("data/img_tech_note/3_uncertainty.png").parent.mkdir(parents=True, exist_ok=True)

in your script should work as a workaround and also be safe once we fix this issue.