I’ve been working these last two days on sv-renku, quite successfully (new projects, renkuized old projects). I intend to properly link the data produced by project 1 to project 2, where it will be used as an input. I’ve created a dataset with renku dataset in project 1, and added the path to the dataset by accessing the repo through https (that seems kind of circular to me, so might be ill-advised). When I do that, the dataset is not saved.
I’ve now tried a different approach: I’m directly creating the dataset in project 2, passing the path to the data in project 1 through the --source option (https access to project 1 repo). I managed to copy the data, but still, nothing shows up in the graph in the GUI. Is this way of proceeding correct ? Or should I manage to create the dataset in project 1, and then import it in project 2 through renku dataset import ?
It’s better to use renku dataset import to import a dataset from another renku project (in your case from project1). In this case you will have proper metadata copied over from the original dataset.
If you are using CLI to add data, please write down the command you’ve used so that I can take a look.
Note that after pushing changes to the project, it takes a while (normally a few minutes) for KG to be built and updates (e.g. new datasets) being visible in the GUI.
The dataset still does not appear in the KG (the KG does not build at all in fact in project 1), although I’ve verified it exists and contains the files as expected in project1 by running renku dataset ls-files dataset1
Now, I tried importing dataset1 in project2 with:
renku dataset import https://to-project1-repo.git
but the command fails: Error: Invalid parameter value - Could not process https://sv-renku-git.epfl.ch/pulver/cisbp-to-meme.git. Couldn't test provider <class 'renku.core.commands.providers.dataverse.DataverseProvider'>: Expecting value: line 1 column 1 (char 0) Reason: provider not found for https://sv-renku-git.epfl.ch/pulver/cisbp-to-meme.git Hint: Supported providers are: dataverse, renku, zenodo
I’m confused as to what renku dataset import expects as an input.
Also, could the systematic failure of the KG building process be due to specifics on the SV-deployment of renku ? we are running Renku version 0.6.2 (April 29th 2020).
So, the first dataset add operation succeeds since you can list the files. To see them on KG make sure that:
Changes are pushed to the remote repo
KG is activated for the project. If project1 is private then you need to click on the “Activate KG” button that shows up in the project UI page.
To import a renku dataset, you need to use dataset’s URL from the UI (and not the Git link). URLs have a format like https://sv-renku.epfl.ch/datasets/<dataset-id> or https://sv-renku.epfl.ch/projects/<user>/<project>/datasets/<dataset-id>.
KG must be activated for your project1 if you want to import from that project.
I do know about the failure of KG in the SV deployment. It’s better to ask your devops team about it.
The project seems to be private and I don’t have access to it. Can you please try the following to see if it solved the issue:
Go to project’s gitlab page (go to renkulab and click on “View in Gitlab” on the upper right)
In gitlab, to to project’s Settings > Webhooks (in the left pane)
Scroll down the page and under Project Hooks delete the one for renku lab: https://renkulab.io/webhooks/events
Go back to project’s page in renkulab; refresh the page and the orange button to Activate KG for the project should appear again; click on it and wait for the project to be indexed.
I followed your instructions to re-enable the KG, but it did not resolve the problem.
However, I gave your user access to the two mentioned projects, if you like to take a look.
Currently, this is a testbed with no real data or code. Feel free to play around with it if needed.
It came out that there is a bug where projects without a README.md file won’t get indexed in the KG. I’ve added a README.md file to your project (sorry about it) and re-activated KG for it. Now, the project is in KG and the import worked for me locally.
This bug is fixed in a newer version of renku and will be fixed in renkulab once we upgrade it.
Thx a lot mohammad. This is indeed not something I would have come up with in my bug fixing attempts
I now continued to create our test-setup on renkulab. It still doesnt work quite yet for me:
I’ve re-created a data project WITH a readme and added a dataset. This is the dataset I try to access: https://renkulab.io/gitlab/abiz-lab/roche/toolchain/data/-/blob/4ef4eebe257255e9437f11e081ce217b4ab4742d/.renku/datasets/3c53bc7b-5ffa-4443-a09c-b305b4a069f8/metadata.yml
Again, I try to cross-import the dataset to this project: https://renkulab.io/gitlab/abiz-lab/roche/toolchain/notebook
This is the command I use: renku dataset import https://renkulab.io/datasets/3c53bc7b-5ffa-4443-a09c-b305b4a069f8
This is the error I get: Error: Cannot access knowledge graph: https://renkulab.io/knowledge-graph/projects/toolchain/data Response code: 500
It seems not to handle the group hierarchy correctly.
Do you have some advice for me?
Thx a lot.
fyi: I feel bad to keep you that busy with this. We are currently evaluating how we’re gonna use renkulab in the future and are about to introduce it at our university. This is part of the eval and are support requests naturally will decrease
It ignores the group and the subgroup1.
It seams as if renkulab datasets only work if the project is located in the root, and not within subgroups.
Can you verify that?