Tracking .py file created in repository

I am currently working in a repository where I create a .py file with equations from a dataset, and tracking it to that point works.
I then import the equations from this .py file in another notebook, where I use it in combination with another dataset to create a graphical output. When looking at the workflow, the dataset input is there but the .py is not.
Is there a way to track importing internal .py files like this?

Hi @ Kriegelw,

You can define those files as explicit inputs to your command and renku will mark them as dependencies: renku run --input path/to/equation-file.py .... See https://renku-python.readthedocs.io/en/stable/commands.html#detecting-input-paths for more details.

It’s also possible to define those explicit inputs in your script using renku API: https://renku-python.readthedocs.io/en/stable/api.html. If you are not sure what approach to chose, then use command line argument for the moment since it does not require modifying your scripts.

Kind regards,
Mohammad

Hi @kriegelw,
You could use papermill, e.g.:

renku run papermill notebooks/notebook.ipynb \
notebooks/notebook.ran.ipynb -p importfile definition.py

whereas notebook.ipynb uses the following code to import definition.py:

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    mod = importlib.import_module(basepath+importfile[:-3])
names = getattr(mod, '__all__', [n for n in dir(mod) if not n.startswith('_')])
g = globals()
for name in names:
    g[name] = getattr(mod, name)
1 Like