Delete datasets created in web UI

Hello team,
I created some datasets with the web UI on renkulab.io. Is it possible to delete a dataset again?
Thanks!

2 Likes

hello!

thank you for your question.
For now you can only delete datasets by using renku commands within an environment.
If you start an environment and open a terminal you can run:

$ renku dataset rm <dataset-to-delete>
$ git push

To list datasets in your project you can run renku dataset.
Alternatively, you can also clone the project locally and run the above commands from there.

Hope it helps!
Pamela

There is also a known issue that deleted datasets still appear in the knowledge graph.

You can follow progress on this via these issues:

Hello both,

thanks for the quick reply. Yes, I deleted the datasets with the renku commands but they still show up in the web UI. So, I guess I will follow the issues.

By the way, I noticed that data sets are publicly accessible even if they are created in a private project. See for example https://renkulab.io/datasets/21e06af1-ddba-45dc-b99d-d926ddbcd00f
Is this expected?

Cheers, Henry

Hi @hluetck - yes, for the moment if a project enables the knowledge graph, then the knowledge graph metadata about the project is publicly visible. We try to warn users about this when they create a private project, but perhaps we weren’t explicit enough. Clearly this is not desirable and we’re currently working on a solution that will respect the privacy level of the projects also for KG searches.

Hi all.

This might be related to #137 mentioned above by @cramakri , but not sure so I just wanted to check.

I’m working locally and am trying to delete a dataset or some of the files I do not want within it.

I firstly tried to remove the dataset:
renku dataset rm <dataset-to-delete>

but the dataset still appears:
(venv) jen@jen:[master]~/projects/ace_data_management/renku-projects/meteorology-raw-summary$ ls data
meteorology-raw-summary

As it still appeared, I instead tried removing the files that I do not want (I guess because the original dataset was deleted):
(venv) jen@jen:[master]~/projects/ace_data_management/renku-projects/meteorology-raw-summary$ renku dataset unlink meteorology-raw-summary -I "metdata_wind*"
Error: Invalid parameter value - Dataset does not exist.

Both the dataset and files still show up within the repo:
(venv) jen@jen:[master]~/projects/ace_data_management/renku-projects/meteorology-raw-summary$ ls data/meteorology-raw-summary/

metdata_all_20170122_20170223.csv  metdata_wind_20170122_20170223.csv
metdata_all_20170226_20170319.csv  metdata_wind_20170226_20170319.csv
metdata_all_20170322_20170411.csv  metdata_wind_20170322_20170411.csv

In git log the commits show up:

Author: Jen Thomas <jenny_t152@yahoo.co.uk>
Date:   Thu Mar 19 11:22:38 2020 +0000

    renku dataset unlink meteorology-raw-summary -I metdata_wind*

commit 8171cce619d1f542fa949b4493aa2d6834d88a0c
Author: Jen Thomas <jenny_t152@yahoo.co.uk>
Date:   Thu Mar 19 11:20:28 2020 +0000

    renku dataset rm meteorology-raw-summary

It looks as though the metadata for all the dataset files was removed as part of the commit 8171cce619d1f542fa949b4493aa2d6834d88a0c:
(excerpt from git show 8171cce619d1f542fa949b4493aa2d6834d88a0c)

-    name: Jen Thomas
-  name: metdata_all_20170322_20170411.csv
-  path: data/meteorology-raw-summary/metdata_all_20170322_20170411.csv
-  url: file://../../data_to_archive_post_cruise/met_data/raw/wind/metdata_all_20170322_20170411.csv
-identifier: 77102d45-9414-4f08-9412-ba3f25269b61

Finally, git status

nothing to commit, working tree clean

Is this is what is supposed to happen?
How should I properly proceed to get rid of the remaining files/dataset?

Thanks very much for your help :slight_smile:

Hi Jen,

Thanks for the detailed question! The commands are a bit cryptic (and we’re still working to improve)!

renku dataset rm <dataset name>: this removes the metadata for the dataset, so that renku isn’t tracking it anymore, but doesn’t delete the files that were part of the dataset.
renku rm <dir or filepath> this removes any metadata that still exists for the files (e.g. lfs tracking information) and also the dir or files.

Let us know if that works, and if you have thoughts on how to improve this syntax!

Thanks,
Emma

Ah thanks @emmjab. That worked thanks!

Maybe it would help displaying a message after renku dataset rm <dataset name> saying that metadata has been removed or explicitely say that the files are not removed unless you use renku rm <dir or filepath>? Or an option within renku dataset rm <dataset name> to remove metadata or/and data files?

Hope that helps and thanks again for your reply. Glad that is sorted :slight_smile:

Sorry for reviving this topic, but I wanted to ask: is this resolve? I just recently did renku dataset rm to some datasets, but they are still listed in the Datasets section. The problem is that now I have several datasets with the same name. Well, I just wanted to kindly ask for a small update on this :wink:

Thank you!

Thank you for your question. Let me double check on the status of this issue and get back to you.

Hi @lusamino,
Thanks for reporting the issue. Indeed, there’s a bug causing the removed datasets not to disappear from the search results. The fix is prepared and we plan to release it in a couple of weeks.