Add code from different project

I would like to be able to transparently import code files from other projects, similar to renku dataset add. For example, I would like to use in a different project, but be clear about where I took it from. Should i just pretend that is a dataset and use renku dataset add, or is there a more appropriate function to do this? I was also thinking of submodules, but this would require that would be in a separate git repo.

1 Like

Hi! As you point out, using a submodule would be one solution. I think this is probably the best solution if you want to use the entire other project, but if you just want a file or two, it could be overkill.

The semantics of renku dataset add [file from a git repo] are pretty close to what you want here, but the main problem is that it would put the file into git-lfs by default, and it is cumbersome (though not impossible) to store the file directly in git.

I will see if anyone here has a better idea and get them to contribute here.

ATM, what @cramakri proposed is the best solution. We have an open issue to allow adding/sharing/tracking code similar to what can be done with data:

Thanks a lot, @cramakri and @mohammad-sdsc! I had not seen the issue. Just added a comment to it, as it seems to be concerning data files only, not code. As @cramakri pointed out, placing a code file in git-lfs is counter-productive. Would it not be easy to copy the functionality of renku dataset add but remove the git-lfs part? Or add an option to renku dataset add to let the user choose where the file is placed, and maybe allow a tag for code?

This has been fixed in v0.10.0 - you need to include the --no-external-storage flag to renku commands like this

renku --no-external-storage dataset add ...

Thanks so much for the update on this. It worked like a charm, except that I just got tripped up by an error message that git lfs was not installed. Could it be added to the instructions for installing renku, so that the users know that they need to install git-lfs separately?

1 Like

Hi @schymans,

sorry for taking a while to respond and thanks for the suggestion - I’ve made an issue to add this to the docs:

I am still struggling a little with the following workflow: I take a jupyter notebook from another repo and then modify it to work with my new repo, but I want to record the original provenance of the file for giving proper credit at the end. I can do this with renku --no-external-storage dataset add ... as suggested earlier, but this will put the file into data/externaldataset1/..., even if I use the --destination option, whereas I keep my jupyter files in jupyter/.... What would be the best way of moving or copying the file to my jupyter folder transparently?