[Pulp-dev] pulp-manage-db bug blocking 2.11.0

Wed Dec 7 12:31:40 UTC 2016

Quick context... this story: https://pulp.plan.io/issues/2186

... was intended to make a best effort to help users avoid accidentally
running pulp-manage-db while pulp services are running, because that is
unsafe. It's not expected to be perfect, but just to help. It does that
well.

It's hard to do perfectly, because it's hard to know if the WSGI app might
be running on other machines in a multi-machine deployment. Workers are
easier to track, because they register their existence in the database.

The implementation looks for worker entries in the database. If any are
found, pulp-manage-db asks the user if they are sure they want to proceed.
Or optionally, the user can preemptively force it to proceed with a
command-line option. Here is the full scary message:

"There are still running workers, continuing could corrupt your Pulp
installation. Are you sure you wish to continue?"

Problem: it was discovered that pulp_celerybeat does not clean up its entry
in the worker collection. So in all cases of upgrading from < 2.11 to >=
2.11, a stale entry is present, and the user sees the scary message (unless
they wait 5 minutes [0]).

I suggest we modify that logic to simply ignore any worker entries from
pulp_celerybeat. That would prevent a large number of users from getting an
unwarranted scary message, and it would enable those who want to script or
otherwise automate upgrades to still use this feature.

The safety check would still look for entries from normal workers and the
pulp_resource_manager. Thus it would continue to catch most cases where a
user forgot to stop all pulp services. And since this feature was largely
intended to help katello users, who run "katello-service stop" to stop all
processes with one command, they are unlikely to have stopped the workers
but forgotten pulp_celerybeat/

Given the "known issue" release note [1] I think we could release 2.11.0
with this problem and then fix it quickly in 2.11.1. But my concern is that
the user experience is bad in the mean time. So unless anyone is especially
itching to get 2.11.0 out the door, and doesn't mind a very quick 2.11.1
hotfix, I propose we just make this change and let it briefly block the
2.11.0 release.

If we *really* want to get pulp_celerybeat back into the test, there are
more elaborate options [2] we could pursue later.

Thoughts on the proposed change?

Thanks,
Michael

[0] There is logic that ignores entries more than 5 minutes old, based on
the assumption that those processes are no longer alive. That's helpful,
but plenty of users will be able to "yum update" and then start
pulp-manage-db within 5 minutes of having stopped services.
[1] https://github.com/pulp/pulp/pull/2878
[2] This kind of data quality problem can be solved by versioning the
production of data. In this case, that would mean providing a way to know
if a particular worker entry was created by pulp >= 2.11.0. For example, we
could add a field to the worker collection that contains the version of
pulp the worker was running. That would enable the pulp-manage-db safety
check to ignore entries it knows are problematic, but take newer ones
seriously. That said, I'm not convinced that checking for the
pulp_celerybeat entry provides enough added value to justify such a change.
But we can certainly consider it as a later improvement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20161207/2b3c911d/attachment.htm>