I am currently working in a repository where I create a .py file with equations from a dataset, and tracking it to that point works.
I then import the equations from this .py file in another notebook, where I use it in combination with another dataset to create a graphical output. When looking at the workflow, the dataset input is there but the .py is not.
Is there a way to track importing internal .py files like this?
Hi @ Kriegelw,
You can define those files as explicit inputs to your command and renku will mark them as dependencies: renku run --input path/to/equation-file.py .... See https://renku-python.readthedocs.io/en/stable/commands.html#detecting-input-paths for more details.
It’s also possible to define those explicit inputs in your script using renku API: https://renku-python.readthedocs.io/en/stable/api.html. If you are not sure what approach to chose, then use command line argument for the moment since it does not require modifying your scripts.
Kind regards,
Mohammad
Hi @kriegelw,
You could use papermill, e.g.:
renku run papermill notebooks/notebook.ipynb \
notebooks/notebook.ran.ipynb -p importfile definition.py
whereas notebook.ipynb uses the following code to import definition.py:
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    mod = importlib.import_module(basepath+importfile[:-3])
names = getattr(mod, '__all__', [n for n in dir(mod) if not n.startswith('_')])
g = globals()
for name in names:
    g[name] = getattr(mod, name)
              
              
              1 Like