I am planning to use some large remote sensing datasets of Modis, which are all stored online:
What is a good way of dealing with such datasets? Would it be good to have a script that first downloads the data to some temporary location, extract just what is needed, outputs just a relatively small file and delete the downloaded files? In this way, I could also run it with renku, but then the input data itself is not stored in the repository. Or would you suggest to first add the data to the renku repository? In this way, the repository might become quite big.