Track origins of files in datasets?

I had to download a file (HowardSprings_2020_L3.nc) manually as I could not get a direct link to it, and I placed it in the folder of the respective dataset, then I tried to add it by renku dataset add HowardSprings temp/HowardSprings_2020_L3.nc. This failed with a warning that I should provide a URL for proper tracking. So I contacted the data supplier and he gave me a direct link to a complete file, HowardSprings_L3.nc. I ran renku dataset add HowardSprings URL, forgetting that I still had the untracked HowardSprings_2020_L3.nc in the folder. What happened, is that apparently renku added and committed that file first, with the commit message renku dataset: committing 1 newly added files and then downloaded the second file, HowardSprings_L3.nc and commited with the commit message renku dataset add HowardSprings http://dap.ozflux.org.au/thredds/fileServer/ozflux/sites/HowardSp.... Wouldn’t it be better to warn the user that the repo is dirty before taking executive decisions?

This is all in the repo https://renkulab.io/projects/ba_math_gen-16/et-data. When I browse the datasets, I find both files, but there is no lineage information for either of them. When I look at HowardSprings_L3.nc in the gitlab interface, it points to the commit renku dataset: committing 1 newly added files instead of the renku dataset add commit. But then, the latter does not contain the path anyway, so I am totally lost at how to recover the origin of the files in the dataset. Could anyone help me out here? I was trusting that the lineage gets recorded when running renku dataset add, since this was implicit in the error message of my first attempt, but if this is not the case, I will have to include it for each file in a readme. Not sure how renku dataset update should work if the link is truncated, though.