Rewriting Git merge history, part 2
In part 1, we discovered the problem of rewriting git history in the presence of nontrivial merges. Today, we'll discuss the workaround I chose.
As I previously mentioned, and as Julia Evans' excellent data model document explains, a git commit is just a snapshot of a tree (suitably deduplicated by means of content hashes), a commit message and a (possibly empty) set of parents. So fundamentally, we don't really need to mess with diffs; if we can make the changes we want directly to the tree (well, technically, make a new tree that looks like what we want, and a new commit using that tree), we're good. (Diffs in git are, generally, just git diff looking at two trees and trying to make sense of it. This has the unfortunate result that there is no solid way of representing a rename; there are heuristics, but if you rename a file and change it in the same commit, they may fail and stuff like git blame or git log may be broken, depending on flags. Gerrit doesn't even seem to understand a no-change copy.)
In earlier related cases, I've taken this to the extreme by simply
hand-writing a commit using git commit-tree. Create exactly the state
that you want by whatever means, commit it in some dummy commit and then use
that commit's tree with some suitable commit message and parent(s); voila.
But it doesn't help us with history; while we can fix up an older commit
in exactly the way we'd like, we also need the latter commits to have our
new fixed-up commit as parent.
Thus, enter git filter-branch. git filter-branch comes with a suitable
set of warnings about eating your repository and being deprecated (I never
really figured out its supposed replacement git filter-repo, so I won't
talk much about it), but it's useful when all else fails.
In particular, git filter-branch allows you to do arbitrary changes to the tree of a series of commits, updating the parent commit IDs as rewrites happen. So if you can express your desired changes in a way that's better than “run the editor” (or if you're happy running the editor and making the same edit manually 300 times!), you can just run that command over all commits in the entire branch (forgive me for breaking lines a bit):
git filter-branch -f --tree-filter \
'! [ -f src/cluster.cpp ] || sed -i "s/if (mi.rank != 0)/if (mi.rank != 0
\&\& mi.rank == rank())/" src/cluster.cpp' \
665155410753978998c8080c813da660fc64bbfe^..cluster-master
This is suitably terrible. Remember, if we only did this for one commit, the change wouldn't be there in the next one (git diff would show that it was immediately reverted), so filter-branch needs to do this over and over again, once for each commit (tree) in the branch. And I wanted multiple fixups, so I had a bunch of these; some of them were as simple as “copy this file from /tmp” and some were shell scripts that did things like running clang-format.
You can do similar things for commit messages; at some point, I figured I should write “cluster” (the official name for the branch) and not “cluster-master” (my local name) in the merge messages, so I could just do
git filter-branch \ --commit-msg-filter 'sed s/cluster-master/cluster/g' \ 665155410753978998c8080c813da660fc64bbfe^..cluster-master
I also did a bunch of them to fix up my email address (GIT_COMMITTER_EMAIL
wasn't properly set), although I cannot honestly remember whether I used
--env-filter or something else.
Perhaps that was actually with git rebase and `-r --exec 'git commit --amend
--no-edit --author …'` or similar. There are many ways to do ugly things. :-)
Eventually, I had the branch mostly in a state where I thought it would be ready for review, but after uploading to GitHub, one reviewer commented that some of my merges against master were commits that didn't exist in master. Huh? That's… surprising.
It took a fair bit of digging to figure out what had happened: git filter-branch had rewritten some commits that it didn't actually have to; the merge sources from upstream. This is normally harmless, since git hashes are deterministic, but these commits were signed by the author! And filter-branch (or perhaps fast-export, upon which it builds?) generally assumes that it can't sign stuff with other people's keys, so it just strips the signatures, deeming that better than having invalid ones sitting around. Now, of course, these commit signatures would still be valid since we didn't change anything, but evidently, filter-branch doesn't have any special code for that.
Removing an object like this (a “gpgsig” attribute, it seems) changes the
commit hash, which is where the phantom commits came from. I couldn't get
filter-branch to turn it off… but again, parents can be freely changed,
diffs don't exist anyway. So I wrote a little script that took in
parameters suitable for git commit-tree (mostly the parent list),
rewrote known-bad parents to known-good parents, gave the script to
git filter-branch --commit-filter, and that solved the problem.
(I guess --parent-filter would also have worked; I don't think I saw
it in the man page at the time.)
So, well, I won't claim this is an exercise in elegancy. (Perhaps my next adventure will be figuring out how this works in jj, which supposedly has conflicts as more of a first-class concept.) But it got the job done in a couple of hours after fighting with rebase for a long time, the PR was reviewed, and now the Stockfish cluster branch is a little bit more alive.

I have officially reached the 6-week mark, the halfway point of my Outreachy internship. The time has flown by incredibly fast, yet it feels short because there is still so much exciting work to do.

. I am an intern here at Outreachy working with Debian OpenQA Image testing team. The work consists of testing Images with OpenQA. The internship has reached midpoint and here are some of the highlights that I have had so far.




NB: The context menu allows to switch the fonts on systems where the above snippet has not (yet) been installed. So good enough for a one-off.






This allows for convenient storage. Since it's too cold outside right now, cultivation will have to wait
until spring. This also just needs mycelium one can just buy, and some material fungus digests.
They can also be fed coffee grounds, and harvest of the fruit body is possible circa every two weeks.
KDE










. I am an intern at Outreachy and contributing to the Debian Images Testing project since October 2025. This project is Open Source and everyone can contribute to it in any way. The project uses Open QA to automatically install Operating System Images and test them . We have a community 








Screenshot of Wikipedia edit metadata on Special:RecentChanges with RCFilters enabled. Highlighted edits with a colored circle to the left side of other metadata are flagged by ORES. Different circle and highlight colors (white, yellow, orange, and red in the figure) correspond to different levels of confidence that the edit is damaging. RCFilters does not specifically flag edits by new accounts or unregistered editors, but does support filtering changes by editor types.
Charts showing the probability that an edit will be reverted as a function of ORES scores in the neighborhood of the discontinuous threshold that triggers the RCfilters flag. The jump in the increase in reversion chances is larger for registered editors compared to unregistered editors at both thresholds.
.