Passing a list of input files through papermill

Hello group!

In my Jupyter notebook, I specify a list of input files to be processed like this:
files = [‘msCam1.avi’, ‘msCam2.avi’]
data_folder = ‘/work/imaging-analysis/data/raw_movies’

When running the notebook from the command line through papermill, the list can also be specified and everything runs fine:

papermill --kernel Python3 \
notebooks/01_cnmfE_MotionCorrect.ipynb \
notebooks/01_cnmfE_MotionCorrect.ran.ipynb \
-p data_folder /work/imaging-analysis/data/raw_movies \
-y '{files:[msCam1.avi, msCam2.avi]}'

But if I try to run the exact same command through renku run I get an error(after the notebook has completed):

Input Notebook:  notebooks/01_cnmfE_MotionCorrect.ipynb
Output Notebook: notebooks/01_cnmfE_MotionCorrect.ran.ipynb
Executing: 100%|██████████████████████████████████████████████████████████████████████████████████████| 17/17 [01:01<00:00,  3.64s/cell]
Error: The output directory "data/raw_movies" is not empty.

Delete existing files before running the command:
  (use "git rm <file>..." to remove them first)

        data/raw_movies/msCam1.avi
        data/raw_movies/msCam2.avi

Once you have removed files that should be used as outputs,
you can safely rerun the previous command.

Why does it complain about the input files in this case?

Hi Henry,

I’m trying to understand this case. Are you writing a file to the data/raw_movies folder in your notebook?

Hi Sekhar,

thanks for looking into this.

Yes, the notebook is writing an output file to this folder. If it helps, the notebook is located here:
https://limited.renku.ch/gitlab/hluetcke/caiman-calcium-imaging-analysis/-/blob/master/notebooks/01_cnmfE_MotionCorrect.ipynb

The saving happens in the motion correction cell (3rd last) with the cm.save_memmap function.

Best, Henry

I can recreate this problem. I would have thought that explicitly specifying the inputs/inputs by using one of the mechanisms detailed here https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.run would have fixed it, but they do not.

There are two workarounds you can do:

  • write the output to a different directory than where you read the input
  • omit the data_folder parameter and explicitly specify the input files, e.g., -y '{files: ["data/raw_movies/msCam1.avi", ""data/raw_movies/msCam2.avi"]}'

I am checking with the renku-python developers to see if there is another solution and will file a bug to the issue tracker if necessary.

I have filed a bug report: https://github.com/SwissDataScienceCenter/renku-python/issues/1321

Future developments regarding this question will be visible there.

Thank you. I will check out the workarounds in the meantime.

BTW, this project contains details about the problem and examples of implementing the workarounds.

https://dev.renku.ch/projects/cramakri/output-detection-test