I had to download a file (HowardSprings_2020_L3.nc) manually as I could not get a direct link to it, and I placed it in the folder of the respective dataset, then I tried to add it by renku dataset add HowardSprings temp/HowardSprings_2020_L3.nc
. This failed with a warning that I should provide a URL for proper tracking. So I contacted the data supplier and he gave me a direct link to a complete file, HowardSprings_L3.nc
. I ran renku dataset add HowardSprings URL
, forgetting that I still had the untracked HowardSprings_2020_L3.nc
in the folder. What happened, is that apparently renku added and committed that file first, with the commit message renku dataset: committing 1 newly added files
and then downloaded the second file, HowardSprings_L3.nc
and commited with the commit message renku dataset add HowardSprings http://dap.ozflux.org.au/thredds/fileServer/ozflux/sites/HowardSp...
. Wouldn’t it be better to warn the user that the repo is dirty before taking executive decisions?
This is all in the repo https://renkulab.io/projects/ba_math_gen-16/et-data
. When I browse the datasets, I find both files, but there is no lineage information for either of them. When I look at HowardSprings_L3.nc
in the gitlab interface, it points to the commit renku dataset: committing 1 newly added files
instead of the renku dataset add
commit. But then, the latter does not contain the path anyway, so I am totally lost at how to recover the origin of the files in the dataset. Could anyone help me out here? I was trusting that the lineage gets recorded when running renku dataset add
, since this was implicit in the error message of my first attempt, but if this is not the case, I will have to include it for each file in a readme. Not sure how renku dataset update
should work if the link is truncated, though.