[Pulp-dev] pulp-manage-db bug blocking 2.11.0
bizhang at redhat.com
Wed Dec 7 13:59:17 UTC 2016
+1 excluding pulp_celerybeat
Also since we have the --ignore-running-workers flag and are ignoring
celerybeat I would like to propose we stop prompting the user to continue
and instead just display an error message when we detect running workers:
'Migration halted because there are still running workers, please stop all
workers before re-running this command. If you believe this message was
given in error please re-run the command with the --ignore-running-workers
On Wed, Dec 7, 2016 at 7:31 AM, Michael Hrivnak <mhrivnak at redhat.com> wrote:
> Quick context... this story: https://pulp.plan.io/issues/2186
> ... was intended to make a best effort to help users avoid accidentally
> running pulp-manage-db while pulp services are running, because that is
> unsafe. It's not expected to be perfect, but just to help. It does that
> It's hard to do perfectly, because it's hard to know if the WSGI app might
> be running on other machines in a multi-machine deployment. Workers are
> easier to track, because they register their existence in the database.
> The implementation looks for worker entries in the database. If any are
> found, pulp-manage-db asks the user if they are sure they want to proceed.
> Or optionally, the user can preemptively force it to proceed with a
> command-line option. Here is the full scary message:
> "There are still running workers, continuing could corrupt your Pulp
> installation. Are you sure you wish to continue?"
> Problem: it was discovered that pulp_celerybeat does not clean up its
> entry in the worker collection. So in all cases of upgrading from < 2.11 to
> >= 2.11, a stale entry is present, and the user sees the scary message
> (unless they wait 5 minutes ).
> I suggest we modify that logic to simply ignore any worker entries from
> pulp_celerybeat. That would prevent a large number of users from getting an
> unwarranted scary message, and it would enable those who want to script or
> otherwise automate upgrades to still use this feature.
> The safety check would still look for entries from normal workers and the
> pulp_resource_manager. Thus it would continue to catch most cases where a
> user forgot to stop all pulp services. And since this feature was largely
> intended to help katello users, who run "katello-service stop" to stop all
> processes with one command, they are unlikely to have stopped the workers
> but forgotten pulp_celerybeat/
> Given the "known issue" release note  I think we could release 2.11.0
> with this problem and then fix it quickly in 2.11.1. But my concern is that
> the user experience is bad in the mean time. So unless anyone is especially
> itching to get 2.11.0 out the door, and doesn't mind a very quick 2.11.1
> hotfix, I propose we just make this change and let it briefly block the
> 2.11.0 release.
> If we *really* want to get pulp_celerybeat back into the test, there are
> more elaborate options  we could pursue later.
> Thoughts on the proposed change?
>  There is logic that ignores entries more than 5 minutes old, based on
> the assumption that those processes are no longer alive. That's helpful,
> but plenty of users will be able to "yum update" and then start
> pulp-manage-db within 5 minutes of having stopped services.
>  https://github.com/pulp/pulp/pull/2878
>  This kind of data quality problem can be solved by versioning the
> production of data. In this case, that would mean providing a way to know
> if a particular worker entry was created by pulp >= 2.11.0. For example, we
> could add a field to the worker collection that contains the version of
> pulp the worker was running. That would enable the pulp-manage-db safety
> check to ignore entries it knows are problematic, but take newer ones
> seriously. That said, I'm not convinced that checking for the
> pulp_celerybeat entry provides enough added value to justify such a change.
> But we can certainly consider it as a later improvement.
> Pulp-dev mailing list
> Pulp-dev at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pulp-dev