[Pulp-dev] 2.10.1 Rollback Post-Mortem
sean.myers at redhat.com
Tue Nov 1 23:54:15 UTC 2016
With the rollback of 2.10.1 done and 2.10.2 building, I thought it would be good
to go over exactly what went happened. Hopefully this info will help to get us all
on the same page as we come up with ways to prevent this from happening again. I'll
be pasting links inline instead of using markdown footnotes to make things a little
easier if you want to follow along step-by-step. Without further ado...
A user reported a failed migration in 2.10.1 shortly after its GA release:
While this is certainly troubling, what was particularly worrisome about this
was that the failing migration was related to a new feature introduced in 2.11,
which had no business popping up in a 2.10 release:
https://pulp.plan.io/issues/1983 & https://github.com/pulp/pulp/pull/2637
It became immediately apparent that at some point master had been merged to 2.10-dev,
introducing 2.11's features and bug fixes to 2.10's next release. Shortly thereafter,
I pulled the 2.10.1 release out of our stable repo, leaving 2.10.0 in its place, so
that more users would not be affected. 2.10.1 is now unavailable.
mhrivnak tracked down the merge commit that brought the 2.11 changes back to 2.10:
While he was finding that, I went through the issues open against the 2.11 platform
release and first checked to see which commits related to those issues existed on
2.10-dev in platform. Second, I went through the plugins to see if any plugins were
affected. None were. I filed a bug to track the effort of fixing 2.10-dev in the
platform repository at this point:
Once the merge commit was identified, I did some diffing to find and fix other
commits to 2.10-dev that did not belong. All commits are documented in #2378,
and have since been reverted.
The merge commit is related to PR 2770:
This PR was merged to master, but should have been merged to 2.10-dev. This is where
it happened. The commit from this PR was merged back to 2.10 from master. Since master
was version 2.11 at this point in time, so when it was merged all 2.11-related commits
on master appeared on 2.10-dev. These are the commits that were reverted to fix #2378.
While I have avoided editorializing in this post-mortem, I think it's important to point
out that the problem here, in my opinion, isn't really that this merge happened. I'm
more interested in how 2.10.1 got released with these commits included so that we can
improve our processes and prevent this from happening again.
Given that the merge of master back to 2.10-dev wasn't detected until a user reported
a failed migration, I'm also interested in improving our processes to catch upgrade
failures like this before our users do. Had this migration not failed, I think the
"extra" commits on 2.10-dev may have continued to go unnoticed for an indefinite
amount of time.
Finally, I think it's also worth mentioning that, as seen on pulp-list, 2.10-dev has
been fixed, and we have a workaround to give to folks affected by this. 2.10-dev has
been merged forward through 2.11-dev to master, and I'm currently in the release
process for a 2.10.2 hotfix.
More information about the Pulp-dev