The Case of the Missing Commits (or, The Dangers of Git Forced Updates)

It was a cold Wednesday afternoon when I got the email. Another developer’s mundane bug fixes deployed to production. Nothing special. But an alarmed business owner’s reply caught my eye: “Did this release overwrite our updates to the homepage? The old homepage is now showing up. Please advise.”

Curious, but I had other things to distract me. Until a panicked developer dropped by my desk with a bombshell: not only were the homepage updates gone, but the commits that made them were entirely missing from the commit history! Where did they go? How could we restore them? And how could we tell which other commits had gone missing?

Interlude: Git refresher #

Git, our version control system, models commit history as a tree. When you make a new a commit, it becomes the new tip of the tree; if you merge two branches, you create a new commit that joins the two. Each branch and tag has it’s own commit tree attached to it. But the past history of these trees is not unchangeable (to quote the Doctor, “Time can be rewritten!”). Git servers will, unless configured otherwise, allow users with push access to force an update (through the -f flag) , overwriting the current commit tree on the server entirely with their local copy. The most common use case for this is through git rebase, when you want to apply one branch’s commits onto another’s as if they had happened on that branch.

Back to the story. I started by checking options for git log to see if Git had a magic --include-unfortunately-clobbered-commits option. It doesn’t. So I Googled intensely and eventually stumbled across a helpful Stack Overflow post, giving me the tool I needed to crack this case: git rev-list

git rev-list #

The rev-list command relies on another fact of how Git works – even if a commit isn’t in your current tree, it isn’t necessarily gone. Git will still hold on to the commit unless it’s completely unreachable (at which point git gc will mop it up). rev-list will output a list of all commits reachable from a specified point, with an additional option to exclude commits reachable from another point. So,

git rev-list master ^my_divergent_branch

Will return any commits reachable from master but which aren’t reachable from my_divergent_branch. So if I could find a reference which contained the missing commits, I could use this tool to identity the full set of missing commits, then cherry-pick them back into master.

Caveat! Did you remember to tag? #

There’s a tricky part here. If you have no branch or tag that contains your missing commits, Git will likely have already removed them and rev-list can’t help you. But if you follow best practice and tag your releases, then you should never lose anything that was released.

Luckily, we always tag our deploys, so this was easy. I just wanted the commits that were reachable as of that release, but which are no longer reachable:

git rev-list ^master release-2014_01_21_16_20

But could other commits still be missing? What if something was merged into master, but not released, and then went missing? rev-list allows the --all option, which retrieves every commit possible… so we could try:

git rev-list --all --max-age=1390276800 ^master

Which fetches all commits since the 21st that are currently unreachable. After some cherry-picking, they were back in master where they belonged.

But… whodunit? #

Unfortunately, I wasn’t able to figure out exactly what caused the commits to be deleted. I could narrow it down to a time window and a list of possible commits, but I wasn’t able to identity exactly what caused this. If you have a way of identifying when and how history was changed, I’d love to hear it!

I was, though, able to get forced updates turned off for our repo. I think our auditors, and my sanity, will be happier that way.

 
2
Kudos
 
2
Kudos

Now read this

Ruby’s Timeout

If you think you’ve been around the block a few times and know your ins-and-outs of Ruby’s funkiest details, here’s a quick Ruby quiz for you: on MRI, what does this piece of code print out? require 'timeout' def do_stuff sleep(2) puts... Continue →