How to see the full command again?

rcnijzink · 25 February 2021 07:27

In the old renku versions, git log myfile always showed the exact command how it was created. I used this a lot, it was super useful, as you could check which script was used and with which settings in case you want to do something similar. The newest renku-version only shows renku run: committing 1 newly added files,which is not very informative. How can I see the full command now?

tolevski · 25 February 2021 15:40

Hi @rcnijzink,

We are working on improving the way the knowledge graph stores metadata and these improvements (when ready) should help with the issue you are having.

In the meantime you can get the almost same thing you are looking for by doing the following:

work ❯ test-project-10 ▶ master ▶ $ ▶ renku run python script.py --output file2.txt
 
work ❯ test-project-10 ▶ master ▶ 2⬆ ▶ $ ▶ renku show outputs -v
PATH       COMMIT                                    GENERATION TIME      WORKFLOW
---------  ----------------------------------------  -------------------  ------------------------------------------------------------
file.txt   fc0713229e316990872ab8431452089e52bbc46d  2021-02-25 14:51:28  .renku/workflow/d230e7d2d103427db46ad93401491cc1_rerun.yaml
file2.txt  cd90aca00964bf09affaf121eee896f2caac7173  2021-02-25 15:33:22  .renku/workflow/588bc59ccad544358c91f03caf5e0300_python.yaml
 
work ❯ test-project-10 ▶ master ▶ 2⬆ ▶ $ ▶ git log -n 1 .renku/workflow/588bc59ccad544358c91f03caf5e0300_python.yaml
commit 747e13c5c55ca13f5c9480acc349a3e4c6005eb6 (HEAD -> master)
Author: Tasko Olevski <tasko.olevski@sdsc.ethz.ch>
Date:   Thu Feb 25 15:33:22 2021 +0000

    renku run python script.py --output file2.txt

tolevski · 25 February 2021 15:42

So basically running renku show outputs -v will show you all outputs from renku run commands you have done as well as the workflows associated with them.

The workflows are commited and the commit message will store the command that was run. So you can pick the right workflow filename from renku show outputs -v and use that in the git log -n 1 <workflow_filename> command to get what you are looking for.

tolevski · 25 February 2021 15:50

And you can even combine the few steps above into a one liner like this:

renku show outputs -v <output_file_name> | tail -n +3 | awk '{ print $2 }' | git show --quiet

rcnijzink · 2 March 2021 17:23

Thanks! But this just works partially, as the command is cut-off in the end, like this:

commit 6a2c006844a06513ee8f51574c8480fba6fc362c (HEAD -> master)
Author: Remko Nijzink <remko.nijzink@list.lu>
Date:   Mon Mar 1 11:32:44 2021 +0100

    renku run python3 src_py/plot_meanannuals_vom.py -i data/VOM_output/additional_analyses/comp2015/...

But it is just in case of the more complex and longer commands, that I’d like to have a look at it. Just running renku show outputs -v file also takes super long, and only the one-liner actually shows something.

schymans · 19 March 2021 20:51

I came across a similar problem today. When going through my commit messages to find out how I added certain datasets, I found out that the commit messages are truncated and I never see the full path. This is really annoying. Is the full command stored somewhere else?

rcnijzink · 22 March 2021 16:21

Yes, would be great to have a fix here, I installed an older renku-version in a conda-environment to avoid this. It was probably a simple feature, but for me the most useful one actually.

ralf.grubenmann · 22 March 2021 16:35

The reason this was added was that long git commit messages are discouraged, usually, 50/72 characters (for summary respectively body) are recommended, though we opted for 100 as things like renku dataset import zenodo are already 28 characters long, and also because long commit messages caused issues for some users: Overwrite default commit when adding files in bunch to renku · Issue #1633 · SwissDataScienceCenter/renku-python · GitHub The biggest issue is when someone does e.g. renku dataset add folder/*, where * gets expanded by the shell and you get a very long command that’s probably not useful to anyone.

If you look at the implementation, it actually is flexible, it unfortunately just isn’t configurable: renku-python/scm.py at master · SwissDataScienceCenter/renku-python · GitHub

I think there is merit in limiting the length of the first line of the commit message, so I wouldn’t want to change that. But we could make the length configurable on a per-project level (via renku config set), and to turn on the wrapping that’s already supported, so no information is lost in any case.

schymans · 22 March 2021 20:36

Thanks, @ralf.grubenmann, I understand now why the length of commit messages has to be limited, but I don’t understand how to ensure that no information is lost, or how renku could allow that part of a command is lost. Is it not possible to retrieve the full command if it was too long? Wouldn’t this break reproducibility?

ralf.grubenmann · 23 March 2021 07:19

I was mostly thinking that since a commit message can have multiple lines, we can wrap the command, so you have something like

renku run --python myscript.py file1 file2...
file3 file4 file5 file6...
file7 file8

So the information is still there, just not on a single long line. It might still need an upper limit to not break things, I’m not sure about that.

But honestly, I think using the commit message to figure out what happened isn’t necessarily the right thing to do and is more of a crutch to achieve something that we don’t yet properly support.

We are currently in the process of designing a more fully featured and improved renku workflow experience, so at least on the workflow side, I think we should handle this specifically with this use-case in mind. The current design docs for this are at workflow UX improvements · Issue #1875 · SwissDataScienceCenter/renku-python · GitHub and I think that for instance the proposed renku workflow history command would be a much better place to retrieve the command used in an execution. Something like renku workflow history --full-command myfile would seem much cleaner, rather than the git history which is more of a side-effect of renku operations than a proper user-case. Also, with these changes, there might not even be a commit or multiple workflow executions could end up in a single commit, so using git log for this purpose wouldn’t work anymore anyways.

All of this is still in the design phase, so subject to change. But any wishes, suggestions or criticism is very welcome!

schymans · 23 March 2021 08:24

Thanks a lot, @ralf.grubenmann! This sounds all good, but I have 2 comments and a question.

Comment1: Please don’t forget to include renku database add in these discussions, as the link is often truncated as well.

Comment2: I think it would still be good to have the full command in the commit message. Renku could keep the subject of the commit message short and put the full command in the body.

Question: How can we access the full commands executed in the current and previous versions of renku?

ralf.grubenmann · 23 March 2021 17:15

Regarding comment1: Assuming you mean renku dataset add, do you use the git log to know when which file was added? Because with renku dataset ls-files you can already see the files in a dataset. So I’m interested in hearing your use-case for seeing the full command in git log. In any case, having a truncated summary line and the full command in the body as discussed above would apply to all renku commands, so this case would be covered.

Comment2: Agreed

As to the question, I don’t think it’s easily possible at the moment. We do know what command was run through the renku metadata, and eventually you can see that with e.g. the renku workflow history command mentioned above and probably also in the UI on renkulab, once those features are implemented.
But right now the information is spread across several nodes in the knowledge graph and you’d probably have to write python code importing renku classes to get it in a human-readable form.
Other than that, I think the closest you can get at the moment is by doing renku log --format Makefile <paths> which will output a makefile with the commands used to create <paths>.

rcnijzink · 24 March 2021 08:43

Yes, thanks for all the clarifications! I wondered, would looking at the workflows also be an option? I found

renku workflow set-name create output_file

but didn’t manage to see the workflow for one specific file. When I did the above, it just added a commit with that command, but didn’t see any exported workflow file.

ralf.grubenmann · 24 March 2021 09:14

set-name just gives an identifier to a file.

I just saw that our docs mention this command for exporting workflows, but I think that part of the docs is wrong, it should read: renku workflow create output_file. This will generate a CWL file. On a side-note, create isn’t the best naming, it should really be called renku workflow export.

rcnijzink · 24 March 2021 09:29

Okay, thanks! But then I get an error:

Traceback (most recent call last):
  File "[...]/renku/cli/exception_handler.py", line 121, in main
    result = super().main(*args, **kwargs)
  File "[...]/renku/cli/exception_handler.py", line 87, in main
    return super().main(*args, **kwargs)
  File "[...]/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "[...]/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "[...]/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "[...]/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "[...]/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "[...]/renku/cli/workflow.py", line 172, in create
    result = create_workflow_command().build().execute(output_file=output_file, revision=revision, paths=paths)
  File "[...]/renku/core/incubation/command.py", line 131, in execute
    output = context["click_context"].invoke(self._operation, context["client"], *args, **kwargs)
  File "[...]/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "[...]/renku/core/commands/workflow.py", line 101, in _create_workflow
    workflow = graph.as_workflow(outputs=outputs,)
  File "[...]/renku/core/commands/graph.py", line 435, in as_workflow
    assert isinstance(node.activity, ProcessRun)
AssertionError

This also happens also for different files. Sometimes, I get this error:

Error: Found multiple activities that produced the same entity at commit c88685865ec400ed03cdb0ea74180fb953fe938d

Any thoughts on what is going wrong?

ralf.grubenmann · 24 March 2021 09:40

The UX around that command isn’t that great, it’s a command that hasn’t gotten a lot of love, unfortunately.

While I can’t be sure exactly, that error indicates that the file you passed to the command was created in a regular git add & commit, not through a workflow. The <path> passed to the command has to be a file generated by a workflow (i.e. an output file) for it to work.

so e.g.

$ renku run cp myfile myoutputfile
$ renku workflow create myoutputfile   # this works
$ renku workflow create myfile # this gives the error you got

The semantics of the command are “Produce a CWL that generates a file as it was generated by renku run/rerun/update commands”.

But with how little known/used that command is, it might also be a proper bug.

rcnijzink · 24 March 2021 09:52

Okay, thanks! They were all created with renku run though. But I was also asking, as the renku log --format Makefile <paths>, that you suggested before, takes a really long time. But that probably relates to this issue: Importing dataset: resource not in KG - #2 by jachro

I am currently looking at these workflows and histories, mainly to check the lineage and see if everything is indeed reproducible as this repo comes with a paper we hopefully soon submit. Is there also an option to do a renku update --dry-run for example? The repository has quite some long model runs in it, and I don’t actually want to re-run anything, but mainly check if everything is their to reproduce the results.

ralf.grubenmann · 24 March 2021 10:17

We are working on improving how we store and process the metadata for workflows that should significantly improve performance. This is already partly implemented, though hidden in the CLI help for now.

You can do

$ renku graph generate  # this generates the new metadata format alongside the old format
$ renku graph update --dry-run

But I think the output of that command is not as detailed as you’d need for your purposes, it just lists the names of all the steps that would be involved in the update, not the commands. You could probably manually edit the renku source to output what you want here, by editing this line renku-python/graph.py at master · SwissDataScienceCenter/renku-python · GitHub if you need a (hacky) solution right now. You’d want to output p.to_run().to_argv() + p.to_run().to_stream_repr() instead of p.

renku log currently still works by walking and processing individual commits, and commits those commits depend on (O(n^2) upper bound), so in a project with as many commits as yours, it can take quite long. The new way we handle metadata stores all relevant metadata in the head commit in two files, so all of it is immediately available (plus the time it takes to load this into memory), so it has much better performance and is more robust towards git rebases and things like that.
I remember there being some issues in your repository that we had to fix as part of implementing the renku graph generate command, so you might be running into those in the renku log/renku workflow create commands.
The initial generate still takes a while though, as it has to walk commits to process this metadata. This new metadata storage also prompted us to start the workflow UX changes I mentioned above, as it enables us to be much more flexible with what we do with workflows. But unfortunately most of that only exists in our heads so far.

rcnijzink · 24 March 2021 10:53

Ok, super, that looks useful, I will try it out! Thanks a lot for your quick responses!

Topic		Replies	Views
How to find workflow?	7	248	28 March 2022
Feedback from new user Renku (CLI)	6	342	7 June 2022
Renku run no workflow recorded	7	153	13 June 2023
Workflow recording bug: plans vs plans-by-name Renku (CLI)	6	310	13 May 2022
Renku-python v1.0.1: Output path recognized as output instead of output file Renku (CLI)	1	384	13 December 2021

How to see the full command again?

Related topics