Datasets Disappearing

Hi,

We are working on a project which had 4 Datasets (originally uploaded through the UI with ‘Add Dataset’). Sometimes our person of contact was adding files to the UI with the ‘modify dataset’ button.
However, lately he was not able to add any data anymore as there were no datasets visible anymore in the UI in the Datasets tab. It was showing a ‘No dataset found for this project’.

Fortunately, this was only in the UI and all the files are still available in the Renku session.
I created a new dataset, which is ‘visible’ and he is now able to add files to this new dataset through the UI again.

Would it be possible to fix this and make the 4 datasets visible from the interface again ?

There are 2 potential reasons for this (but maybe it is not due to any of these):

  • We do not like to have the raw data in the data/ folder directly. So every time raw files were added, we moved them to another folder in data/raw/ and deleted the automatically created new folder in data. – For example, for the dataset ‘vbv_data’, when files were added, in the renku session a folder named ‘vbv_data’ was created in data/vbv_data. We moved these files to data/raw/vbv_data and deleted the now empty folder data/vbv_data. By the way, would it be possible to let an option for this in Renku by choosing a sub folder in which to store each datasets like data/raw/ instead of data/ automatically ?

  • Our default branch is develop, not master. So when there are updates on Renku, we need to manually merge them in develop. We could have missed one or done something in the wrong order.

Thank you for your help,

Pauline ML

Thank you for throughly describing the situation: your information is very helpful for debugging the issue.

I have some ideas about what could be the cause of this problem, but it would be helpful to look at the project to verify that. Would it be possible to give me, cramakri, access to this project and send me a link to it (by DM if you prefer)?

If that is not feasible, that is also OK – I think I can recreate the scenario that is causing problems – but if it is possible, it would be a little faster.

Thanks!

Thanks for the fixes. The datasets are now visible again.

1 Like

I’m glad that could get sorted out! :tada:

1 Like

Hi to all,

Would it be possible to know how this problem was solved? I have just created a new project

https://renkulab.io/projects/luis.salamanca/democrasci-parsedpdfs

And even though locally, through the CLI, I have created 4 datasets seamlessly, and added data to them, those dataset do not appear on the UI. I preferred to follow-up here as these errors might be related, but in any case I can see many errors related to datasets recently, so perhaps I should have gone somewhere else.

Thank you so much!

Hi Luis! Given the age of your project, which is quite new, I am pretty sure that the situation that you are seeing is different from the one that PaulineML had.

I think the problem you are having is probably the same as Dataset is not in KG. That problem is still being investigated, but you should see the follow-up in the other post (and we can notify you as well).

Awesome! I will keep an eye on that thread then! In any case, I can see locally the datasets, and all the metadata associated, so I assume things are working correctly. I know they will just pop up magically on the UI :slight_smile: On the meantime, I stay vigilant.

Thank you so much!

@lusamino The project you created through the UI was created using renku-python 1.2.2.
But then locally you have 0.16.0 installed and you added datasets with that. But 0.16.0 stores datasets in a different way that 1.x.x cannot read.

So if you install 1.2.4 locally and use that to create the datasets, everything should work (also, delete the .renku/datasets/ folder).

You should not have been able to run 0.16.x on a project created with 1.x.x. but it looks like the code that prevents this was only added when creating a project on the command line (renku init), not when creating a project through the service.

Thanks @ralf.grubenmann for the answer! Actually, this is another super confusing thing for me. Let’s hope I can explain myself.

I remember a long time ago I installed Renku with the pipx commands provided here GitHub - SwissDataScienceCenter/renku-python: A Python library for the Renku collaborative data science platform.

Then, if I do which renku, I can perfectly locate the path, and all seems to work. However, before starting this new repo, I checked the version and it was on 0.14.0. Therefore, I decided to upgrade, and the command I found was pipx upgrade renku, and that took me to 0.16.0. Today I run it again and it prompts this

$ pipx upgrade renku                                    
renku is already at latest version 0.16.0 (location:
/Users/<user>/.local/pipx/venvs/renku)

But, for the new repository, I created a new Python environment (using conda env). In that new environment I cannot see renku through pip. So I decided to install it in the new env using standard pip install renku==1.2.2. And in this case, I can force the version up to 1.2.2, something that I could not with pipx.

Therefore, now I am extremely confused. Should I use pipx for installing Renku as you suggest? But that will pin me to version 0.16.0. Or should I go with pip in each repo, that allow me going up to 1.2.4.?

Sorry for the weird questions, but honestly, this is super confusing to me.
Thank you so much!
Cheers
Luis

We have switched our images from using pipx to using pip.

pipx has internal references to wheel and setuptools, among others, which it just automatically updates every 30 days, and there’s no way to pin those. So pipx will just update those at random times for different users, and we had issues in the past where some version of renku-python didn’t work with a new version of setuptools and pipx would just upgrade to that with no way for us to prevent that, breaking users environments.

Hence we now use pip in a separate python environment in our Dockerfiles, so we can properly pin all version of all packages. But you can continue using pipx in your local environments, if you want.

With some of that background info out of the way, the most likely reason why you can’t pipx upgrade renku is that your pipx uses your system Python version and that is Python 3.6, and we dropped support for Python 3.6 in renku-python version 0.16.1. If you upgraded to Python 3.7, pipx should happily upgrade renku-python to 1.2.2. Inside the conda environment, conda probably installed a more current version of Python, hence you could install 1.2.x.

Please note that Python 3.6 reached it’s end of life on 23.12.2021, so it’s highly encouraged that you upgrade to at least 3.7, which has an EOL of 27.06.2023.

Thanks @ralf.grubenmann !! Then, that fully clarifies everything, and now I can perfectly tackle the issue. Thank you so much again!

Btw, I guess that, if I just remove .renku/datasets, I can start again the creation without any further issues, is that right? (I have not done any renku run yet)

Cheers
Luis

Removing that folder is more of a cleanup thing. Renku >= 1.0.0 just complete disregards it, so there’s no reason to keep it around. So it’d work even without removing it.

What I definitely would do is add a file .renku/metadata.yml with this content:

# Dummy file kept for backwards compatibility, does not contain actual version
'http://schema.org/schemaVersion': '9'

This should have been created automatically when you created the project, but there was a bug(fixed but not released yet) that meant it only worked when doing renku init to create a project, not when creating a project through the renkulab.io UI.
This is only there for backwards compatibility and what it does is make renku-python <1.0.0 refuse to work on the project (which would have prevented you running into the original problem :slight_smile: ). So just remove .renku/datasets/ and add this file in a commit and all should be golden.

1 Like