Sv-renku: renku dataset import fails

Hi Renku team,

I’m in the final phase of re-running the code for a publication, and I would like to chain outputs of projects via proper renku datasets so that updates can be easily propagated when I correct something.

However, I cannot import datasets from my projects through the API, because the knowledge graph doesn’t finish building.

I cannot import datasets via the CLI either, I tried:

renku dataset import https://sv-renku.epfl.ch/datasets/72d79463-d567-4a48-87ec-4341b4b90b34

returned:

Error: Resource not found in knowledge graph: https://sv-renku.epfl.ch/datasets/72d79463-d567-4a48-87ec-4341b4b90b34

Any chance to solve this issue ? It would be really nice to not have to propagate changes in datasets by reuploading files through the cli each time. Thanks !

Cyril

Hi Cyril,

Renku CLI uses the knowledge graph to retrieve dataset’s info when importing a dataset; so, if KG indexing is not done then you cannot import the dataset (or might not get the latest updates).

Do you know why KG indexing won’t finish? Is there an error message?

Also I’m not sure what do you mean by

It would be really nice to not have to propagate changes in datasets by reuploading files through the cli each time.

If there are changes in a local dataset that is imported by other projects, you need to push the project so that these changes are indexed by the KG and can be seen by other projects; Basically, you need to run renku dataset update and git push in your local project and then renku dataset update in projects that imported the dataset. Do you use a different approach to update renku datasets?

Mohammad

There is no info on why KG indexing won’t finish, but it’s never worked for me. It just stays stuck at some <100 percentage on any project

dear @bopekno
we have taken a look at your KG indexing, we needed to increase the capacity of our triples generator in SV. Some big projects were taking up all of the resources when doing a very long renku migration.
You should see the indexing bar progressing quickly now.

1 Like

Hello @pameladelgado,

Thanks a lot, datasets are now in the KG. But when I try importing a dataset in another project, I get the following error:

Errors occurred while performing this operation.

Dataset import failed: Invalid parameter value for https://sv-renku.epfl.ch/datasets/72d79463-d567-4a48-87ec-4341b4b90b34: Cannot clone remote projects: git@sv-renku-git.epfl.ch:pulver/ensembl-tss-clustering-r.git https://sv-renku-git.epfl.ch/pulver/ensembl-tss-clustering-r.git

Here’s a printscreen:

The error message says that renku cannot clone the project that contains the dataset. This should be due to insufficient access permission to the repo. You can check this by trying to clone the repository: git clone git@sv-renku-git.epfl.ch:pulver/ensembl-tss-clustering-r.git which should also fail.

Are you trying to import this on your local machine or in an interactive session? If you are on your local machine then you need to set up an ssh key to the access the gitlab associated with the renku deployment.
We don’t support importing datasets from private/internal projects in interactive sessions yet; so, to import a private dataset you need to clone your project and import the dataset in your local machine and then push the project.

1 Like