DISCLAIMER: The following is a pretty dumb/simple version of a merge tool for renku metadata. Do not use this if you’re not comfortable with resolving merge conflicts in JSON structure, as resolving a merge conflict the wrong way can mess up your metadata and corrupt the project. The approach outlined here entails modifying the bare-bones data used by renku and is quite technical.
For this to work you need to have zstd and jq installed
You can create a script merge.sh
somewhere with this:
#!/bin/bash
base=$1
local=$2
remote=$3
function cleanup {
echo "trap cleanup"
rm -f ${base}.raw ${local}.raw ${remote}.raw ${base}.raw_pretty ${local}.raw_pretty ${remote}.raw_pretty
}
trap cleanup EXIT
echo "unzstd first"
unzstd -q $base -o $base.raw || cp $base $base.raw
echo "unzstd second"
text=false
if ! unzstd -q $local -o $local.raw ; then
text=true
cp $local $local.raw
fi
echo "unzstd third"
unzstd -q $remote -o $remote.raw || cp $remote $remote.raw
echo "pretty formatting"
jq . $base.raw > $base.raw_pretty
jq . $local.raw > $local.raw_pretty
jq . $remote.raw > $remote.raw_pretty
echo "merging"
git merge-file $local.raw_pretty $base.raw_pretty $remote.raw_pretty
exit_code=$?
if [ $exit_code -ne 0 ]; then
${EDITOR:-nano} $local.raw_pretty
fi
echo "compressing"
echo "$text $local"
if $text ; then
mv -f $local.raw_pretty $local
else
cat $local.raw_pretty | zstd -q -f -o $local
fi
echo "done"
(exit $exit_code)
then in your project you modify and commit .gitattributes
so it contains (this tells git to use the custom mergetool for renku metadata files)
.renku/metadata/* merge=zstdmerge
.renku/metadata/**/* merge=zstdmerge
and edit .git/config
in the project to contain (replace <path>
with the path to merge.sh)
[merge "zstdmerge"]
name = ZSTD merge
driver = bash <path>/merge.sh %O %A %B
trustExitCode = true
Renku stores it’s metadata as compressed JSON, using zstd to compress the files. This mergetool is pretty dumb, all it does is uncompress & pretty-print the JSON, call git merge-file
on it, open the local editor configure in $EDITOR
if there was a conflict and then re-compress the result.
git merge-file
is the normal git merge functionality, so it’s not that smart and conflicts are likely.
The conflicts will probably look like this:
[...]
"@renku_data_value": [
{
"@renku_data_type": "builtins.tuple",
"@renku_data_value": [
<<<<<<< .merge_file_Nl9gh4.raw_pretty
"/plans/2298293d971343348c88726313f108f5",
{
"@renku_data_type": "renku.domain_model.workflow.plan.Plan",
"@renku_oid": "7bf93af86204a1602d8eafad579b429ee7eed6cf0660217f33f51c1a62cd8a2b",
"@renku_reference": true
},
"/plans/f6d1d794e98b4d74b7a98c1dbd840c8c",
{
"@renku_data_type": "renku.domain_model.workflow.plan.Plan",
"@renku_oid": "307ecbd3ddc6d6211672ec2a71c2c9cfb1e872ca3b5b55707196a7a9b07d516b",
=======
"/plans/a6ca0683be6048299db22b38b5aa0893",
{
"@renku_data_type": "renku.domain_model.workflow.plan.Plan",
"@renku_oid": "49f68b42d9dc81c4c5f5af6279f64d36d15811dc8a99e79851b102a37368f997",
"@renku_reference": true
},
"/plans/bb1da60fc13c4a3b9a5564cc37ea6531",
{
"@renku_data_type": "renku.domain_model.workflow.plan.Plan",
"@renku_oid": "d6115a28ad9a631debae609ccef2c9f897711192d4747392af050c81e4b3739f",
>>>>>>> .merge_file_rGN62b.raw_pretty
"@renku_reference": true
}
]
[...]
and a correct merge would look like
[...]
"@renku_data_value": [
{
"@renku_data_type": "builtins.tuple",
"@renku_data_value": [
"/plans/2298293d971343348c88726313f108f5",
{
"@renku_data_type": "renku.domain_model.workflow.plan.Plan",
"@renku_oid": "7bf93af86204a1602d8eafad579b429ee7eed6cf0660217f33f51c1a62cd8a2b",
"@renku_reference": true
},
"/plans/f6d1d794e98b4d74b7a98c1dbd840c8c",
{
"@renku_data_type": "renku.domain_model.workflow.plan.Plan",
"@renku_oid": "307ecbd3ddc6d6211672ec2a71c2c9cfb1e872ca3b5b55707196a7a9b07d516b",
"@renku_reference": true
},
"/plans/a6ca0683be6048299db22b38b5aa0893",
{
"@renku_data_type": "renku.domain_model.workflow.plan.Plan",
"@renku_oid": "49f68b42d9dc81c4c5f5af6279f64d36d15811dc8a99e79851b102a37368f997",
"@renku_reference": true
},
"/plans/bb1da60fc13c4a3b9a5564cc37ea6531",
{
"@renku_data_type": "renku.domain_model.workflow.plan.Plan",
"@renku_oid": "d6115a28ad9a631debae609ccef2c9f897711192d4747392af050c81e4b3739f",
"@renku_reference": true
}
]
[...]
Note that git does not see the
"@renku_reference": true
},
part at the end as a conflict, so I had to copy/duplicate that to the middle for the merge to be valid.
The structure is always string followed by dictionary, repeating. There are database indices pointing to the workflows, the workflows themselves shouldn’t have conflicts as they end up in different files.
There’s probably around 5 conflicts of you just execute two workflows in parallel, more if there are more workflows on the different branches.
I would highly recommend running renku log
and renku workflow visualize <some_output_file_of_a_merged_workflow>
to verify that metadata is intact after the merge (and renku dataset show
if a dataset was modified).
This is rather cumbersome and definitely not something I’d recommend doing long-term. The custom mergetool we are implementing at the moment is much smarter than this and understands the metadata and can deduce how to properly merge things. And it will guide the user in case of a conflict. So I hope we can release that soon.
I hope this helps. Let me know if you have any questions