Merging with Renku

I was running different model cases on renkulab and locally. Previously, I could do so, and then merge the two model results. Usually, there was just a merge conflict in .gitattributes, which I could easily solve. Now, I got more merge conflicts in renku files:

remote: Enumerating objects: 569, done.
remote: Counting objects: 100% (569/569), done.
remote: Compressing objects: 100% (349/349), done.
remote: Total 495 (delta 160), reused 309 (delta 87), pack-reused 0
Receiving objects: 100% (495/495), 97.71 KiB | 24.00 KiB/s, done.
Resolving deltas: 100% (160/160), completed with 24 local objects.
From renkulab.io:remko.nijzink/vom-sens-crv
   19fc84b..261da5a  master     -> origin/master
Filtering content: 100% (108/108), 2.83 GiB | 36.83 MiB/s, done.
warning: Cannot merge binary files: .renku/metadata/7f/f1/7ff1161310494190bbcd93ed1f450889e2a63892e81e443d9fcd303b276dcc68 (HEAD vs. 261da5abc96e147703d809c2da7d169a1fd97095)
warning: Cannot merge binary files: .renku/metadata/42/33/42339961a8844b36b9974d4d25560e93a3a7c3acff2a4c02923f31bead917bdb (HEAD vs. 261da5abc96e147703d809c2da7d169a1fd97095)
Auto-merging .renku/metadata/plans-by-name
CONFLICT (content): Merge conflict in .renku/metadata/plans-by-name
Auto-merging .renku/metadata/plans
CONFLICT (content): Merge conflict in .renku/metadata/plans
Auto-merging .renku/metadata/activities-by-usage
CONFLICT (content): Merge conflict in .renku/metadata/activities-by-usage
Auto-merging .renku/metadata/activities-by-generation
CONFLICT (content): Merge conflict in .renku/metadata/activities-by-generation
Auto-merging .renku/metadata/activities
CONFLICT (content): Merge conflict in .renku/metadata/activities
Auto-merging .renku/metadata/7f/f1/7ff1161310494190bbcd93ed1f450889e2a63892e81e443d9fcd303b276dcc68
CONFLICT (content): Merge conflict in .renku/metadata/7f/f1/7ff1161310494190bbcd93ed1f450889e2a63892e81e443d9fcd303b276dcc68
Auto-merging .renku/metadata/42/33/42339961a8844b36b9974d4d25560e93a3a7c3acff2a4c02923f31bead917bdb
CONFLICT (content): Merge conflict in .renku/metadata/42/33/42339961a8844b36b9974d4d25560e93a3a7c3acff2a4c02923f31bead917bdb
Automatic merge failed; fix conflicts and then commit the result.

What is the best way to have simultaneous tasks in one repository, without causing conflicts?

We’re currently implementing add a custom git mergetool for our metadata · Issue #2846 · SwissDataScienceCenter/renku-python · GitHub which adds a git mergetool that can automatically merge Renku metadata, which would mean you could continue working as usual and git merge should just work.

I’m not sure when this will be released, it should be finished in 2-3 days but might not make it into the renku-python release this Friday. Most of the functionality is there but it still needs a lot of testing.

But I should be able to quickly write up a custom mergetool bash script that should already help, I’ll post it later. Do you use a visual merge tool when merging files or how do you usually merge them?

Okay, that would be great! I do not use a visual merge tool, but usually do it on the command line. Or, more specific, I usually pull, and when there are merge conflicts I try to just have a look at the file and edit it in the way I want. But I think a bash script should be really helpful!

DISCLAIMER: The following is a pretty dumb/simple version of a merge tool for renku metadata. Do not use this if you’re not comfortable with resolving merge conflicts in JSON structure, as resolving a merge conflict the wrong way can mess up your metadata and corrupt the project. The approach outlined here entails modifying the bare-bones data used by renku and is quite technical.

For this to work you need to have zstd and jq installed

You can create a script merge.sh somewhere with this:

#!/bin/bash

base=$1
local=$2
remote=$3

function cleanup {
    echo "trap cleanup"
    rm -f ${base}.raw ${local}.raw ${remote}.raw ${base}.raw_pretty ${local}.raw_pretty ${remote}.raw_pretty
}

trap cleanup EXIT

echo "unzstd first"
unzstd -q $base -o $base.raw || cp $base $base.raw
echo "unzstd second"
text=false

if ! unzstd -q $local -o $local.raw ; then
    text=true
    cp $local $local.raw
fi

echo "unzstd third"
unzstd -q $remote -o $remote.raw || cp $remote $remote.raw

echo "pretty formatting"
jq . $base.raw > $base.raw_pretty
jq . $local.raw > $local.raw_pretty
jq . $remote.raw > $remote.raw_pretty

echo "merging"
git merge-file $local.raw_pretty $base.raw_pretty $remote.raw_pretty
exit_code=$?

if [ $exit_code -ne 0 ]; then
    ${EDITOR:-nano} $local.raw_pretty
fi

echo "compressing"
echo "$text $local"

if $text ; then
    mv -f $local.raw_pretty $local
else
    cat $local.raw_pretty | zstd -q -f -o $local
fi

echo "done"
(exit $exit_code)

then in your project you modify and commit .gitattributes so it contains (this tells git to use the custom mergetool for renku metadata files)

.renku/metadata/*       merge=zstdmerge
.renku/metadata/**/*    merge=zstdmerge

and edit .git/config in the project to contain (replace <path> with the path to merge.sh)

[merge "zstdmerge"]
        name = ZSTD merge
        driver = bash <path>/merge.sh %O %A %B
        trustExitCode = true

Renku stores it’s metadata as compressed JSON, using zstd to compress the files. This mergetool is pretty dumb, all it does is uncompress & pretty-print the JSON, call git merge-file on it, open the local editor configure in $EDITOR if there was a conflict and then re-compress the result.

git merge-file is the normal git merge functionality, so it’s not that smart and conflicts are likely.

The conflicts will probably look like this:

[...]
           "@renku_data_value": [
              {
                "@renku_data_type": "builtins.tuple",
                "@renku_data_value": [
<<<<<<< .merge_file_Nl9gh4.raw_pretty
                  "/plans/2298293d971343348c88726313f108f5",
                  {
                    "@renku_data_type": "renku.domain_model.workflow.plan.Plan",
                    "@renku_oid": "7bf93af86204a1602d8eafad579b429ee7eed6cf0660217f33f51c1a62cd8a2b",
                    "@renku_reference": true
                  },
                  "/plans/f6d1d794e98b4d74b7a98c1dbd840c8c",
                  {
                    "@renku_data_type": "renku.domain_model.workflow.plan.Plan",
                    "@renku_oid": "307ecbd3ddc6d6211672ec2a71c2c9cfb1e872ca3b5b55707196a7a9b07d516b",
=======
                  "/plans/a6ca0683be6048299db22b38b5aa0893",
                  {
                    "@renku_data_type": "renku.domain_model.workflow.plan.Plan",
                    "@renku_oid": "49f68b42d9dc81c4c5f5af6279f64d36d15811dc8a99e79851b102a37368f997",
                    "@renku_reference": true
                  },
                  "/plans/bb1da60fc13c4a3b9a5564cc37ea6531",
                  {
                    "@renku_data_type": "renku.domain_model.workflow.plan.Plan",
                    "@renku_oid": "d6115a28ad9a631debae609ccef2c9f897711192d4747392af050c81e4b3739f",
>>>>>>> .merge_file_rGN62b.raw_pretty
                    "@renku_reference": true
                  }
                ]
[...]

and a correct merge would look like

[...]
           "@renku_data_value": [
              {
                "@renku_data_type": "builtins.tuple",
                "@renku_data_value": [
                  "/plans/2298293d971343348c88726313f108f5",
                  {
                    "@renku_data_type": "renku.domain_model.workflow.plan.Plan",
                    "@renku_oid": "7bf93af86204a1602d8eafad579b429ee7eed6cf0660217f33f51c1a62cd8a2b",
                    "@renku_reference": true
                  },
                  "/plans/f6d1d794e98b4d74b7a98c1dbd840c8c",
                  {
                    "@renku_data_type": "renku.domain_model.workflow.plan.Plan",
                    "@renku_oid": "307ecbd3ddc6d6211672ec2a71c2c9cfb1e872ca3b5b55707196a7a9b07d516b",
                    "@renku_reference": true
                  },
                  "/plans/a6ca0683be6048299db22b38b5aa0893",
                  {
                    "@renku_data_type": "renku.domain_model.workflow.plan.Plan",
                    "@renku_oid": "49f68b42d9dc81c4c5f5af6279f64d36d15811dc8a99e79851b102a37368f997",
                    "@renku_reference": true
                  },
                  "/plans/bb1da60fc13c4a3b9a5564cc37ea6531",
                  {
                    "@renku_data_type": "renku.domain_model.workflow.plan.Plan",
                    "@renku_oid": "d6115a28ad9a631debae609ccef2c9f897711192d4747392af050c81e4b3739f",
                    "@renku_reference": true
                  }
                ]
[...]

Note that git does not see the

                    "@renku_reference": true
                  },

part at the end as a conflict, so I had to copy/duplicate that to the middle for the merge to be valid.
The structure is always string followed by dictionary, repeating. There are database indices pointing to the workflows, the workflows themselves shouldn’t have conflicts as they end up in different files.

There’s probably around 5 conflicts of you just execute two workflows in parallel, more if there are more workflows on the different branches.

I would highly recommend running renku log and renku workflow visualize <some_output_file_of_a_merged_workflow> to verify that metadata is intact after the merge (and renku dataset show if a dataset was modified).

This is rather cumbersome and definitely not something I’d recommend doing long-term. The custom mergetool we are implementing at the moment is much smarter than this and understands the metadata and can deduce how to properly merge things. And it will guide the user in case of a conflict. So I hope we can release that soon.

I hope this helps. Let me know if you have any questions

The alternative would be to install renku from GitHub - SwissDataScienceCenter/renku-python at feature/2846-custom-mergetool and run renku mergetool install in your project.

That would install the work-in-progress mergetool I’m working on, which in my tests works well for workflows, datasets, project metadata and seems stable. But that’d be very experimental and I couldn’t guarantee that it doesn’t mess up at this point. So only use this if you can accept the risks and definitely verify that the merge didn’t corrupt data with renku log and renku workflow visualize as mentioned above. And right now it’s probably best if you don’t create any actual conflicts (like creating workflows with the same name in both branches).

Okay, thank you! Yes, I think this makes sense, I will try it out.