Best practice for handling external data?

The technical documentation for renku dataset (Renku Command Line — Renku 0.14.1.dev7 documentation) is really helpful, thanks for your efforts.
However, I wondered if there is a place where we could go through some use scenarios. For example, a common problem is that external data cannot be accessed directly through a URL, but the user has to create an account first and/or click a tick box about having read the license conditions before getting to the download button. It seems that in such a case, the only option is to download the file locally and then add it to the file in a second step. However, it is not clear how to do this, as I can only see an example for local files in the docu with the --external option, which would not add the file to the repo. Also, how could this be done on, if someone does not have a locally installed renku?

Some more questions:

  • Is it good practice to store the license conditions in the dataset? If so, how?
  • Can we manually add a readme to each dataset in the folder of the dataset, i.e. manually create the file, and then just add it using git add or rather renku dataset add?

I think we might be able to fill a separate category with questions related to external data exchange.