[Pulp-dev] #2186 (pulp-manage-db) options

Tue Dec 13 19:25:55 UTC 2016

That all sounds great. I think it would be fine as a single issue, since
the increase in heartbeat frequency is specifically motivated by this use
case, even if it may have minor benefit in other ways.

Michael

On Tue, Dec 13, 2016 at 2:04 PM, Brian Bouterse <bbouters at redhat.com> wrote:

> Great, let's go with option 2. I plan to update the story with details,
> but a few more questions first.
>
> To support this plan we have to make a change to the celerybeat heartbeat
> timings. I think we would do well to make the heartbeats occur more
> frequently across the board and consistent.
>
> Currently, pulp workers and resource manager heartbeat every 30 seconds,
> and the resource manager heartbeats every 90 seconds. I propose we make all
> 3 of these components heartbeat every 20 seconds. That way when we have
> pulp-manage-db wait up to 60 seconds after the most recent timestamp, at
> least 2 are expected to have occurred. The existing timings for failover
> can remain unchanged. It also sets us up nicely to have Pulp failover
> quicker in the future, but that is separate change.
>
> So two questions... Does ^ all still sound good? Would you want that as 2
> issues (one for timings change) with a blocking relationship, or do you
> want it all recorded as a single issue?
>
> Thank you for the feedback, I think we're almost at the detailed plan we
> need!
>
> Brian
>
>
>
> On Sun, Dec 11, 2016 at 10:36 PM, Michael Hrivnak <mhrivnak at redhat.com>
> wrote:
>
>> I like option 2 a lot. I'm in favor of revising the story to have it
>> implement this. It should display to the user how long it's going to wait,
>> as in: "Waiting 17 more seconds to ensure no Pulp workers are running." I
>> don't think we need a fancy countdown of any kind. Maybe print a "." every
>> second if you want extra credit.
>>
>> Option 1 could make it more difficult for someone to automate the
>> upgrade. The overall experience of option 2 seems more user-friendly.
>>
>> I like that both of these can be implemented and make a difference for a
>> user's next upgrade, rather than needing to use another upgrade just to get
>> data quality straightened out.
>>
>> Thanks for adding new ideas!
>>
>> Michael
>>
>> On Fri, Dec 9, 2016 at 4:07 PM, Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>
>>> Looking at the process list is easy, but Pulp is a clustered software so
>>> it isn't effective for those users. In terms of the value in checking for
>>> running workers, a clustered installation is really where a user would
>>> especially want this feature. Let's think through some more ideas. Here are
>>> two options, I'm a big fan of option 2.
>>>
>>> Option 1:
>>> * bring the feature back that checks the database (this is easy because
>>> we already have the code)
>>> * have the check consider worker records dead if they are older than 1
>>> minute (not 5)
>>> * also update the worker failover time to be 1 minute instead of 5 (also
>>> easy, and not a requirement since the worker check and worker failover do
>>> not have to use the same timing)
>>> * do not prompt for y/N but instead gave the user output and accepted an
>>> --ignore-workers flag to override
>>> * Make the output from the check be more verbose and identify the
>>> problem, the argument they could pass, and explain that they can wait 60
>>> seconds, and also list the worker entries that are still running.
>>>
>>> Option 2:
>>> As a variation on the above, instead of failing and having a flag, have
>>> pulp-manage-db wait until 1-minute after the latest timestamp and check the
>>> db again. This would potentially be < 20 seconds allowing 40 seconds for
>>> package upgrade. This would be a great user experience because there are no
>>> flags, it would just work and take a little longer.
>>>
>>> I think a user who sees an error they experience during upgrade and can
>>> wait 20ish seconds for it to be confirmed as a real issue is an ok user
>>> experience. I think an interactive user would even be hard pressed to
>>> experience this delay by upgrading the packages and running pulp-manage-db
>>> in under 60 seconds.
>>>
>>> What do you all think about ^?
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Dec 8, 2016 at 5:25 PM, Michael Hrivnak <mhrivnak at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Dec 8, 2016 at 1:49 PM, Bihan Zhang <bizhang at redhat.com> wrote:
>>>>
>>>>> The changes made for #2186 [0] was pulled from the 2.11.0 release
>>>>> yesterday, and we should talk about how to implement it for 2.12
>>>>>
>>>>> From what I can see there are 2 ways to move forward with #2186
>>>>>
>>>>> *1.* We can fix the pulp worker db record cleanup so that
>>>>> pulp_celerybeat exits cleanly (aka put this back: [1]) and make new changes
>>>>> to clean up pulp_workers with a SIGTERM handler in a 2.11.z release.
>>>>> We can then re-revert the commit and put the feature back in 2.12 with
>>>>> little effort.
>>>>>
>>>>
>>>> We would also need a way for pulp-manage-db to know if a record in the
>>>> Worker collection was created by a version of pulp that does not do full
>>>> cleanup. As described in a previous email, one way would be to add some
>>>> kind of new field to the Worker model, such as the version of pulp the
>>>> worker was running.
>>>>
>>>>
>>>>>
>>>>> The original reason #2186 was implemented using the db records was so
>>>>> we can support a clustered pulp installation.
>>>>> But this approach would make migration to 2.12 more difficult, since
>>>>> users now have to upgrade to the 2.11.z release first before going to 2.12
>>>>>
>>>>>
>>>> Perhaps the original intent was lost. This feature was motivated
>>>> primarily by katello deployments (all of which are single-system), where
>>>> users either forgot to stop pulp services or didn't know that was a
>>>> requirement before running pulp-manage-db. The normal upgrade process for
>>>> them handles it automatically, but when manual intervention is required,
>>>> that's where we've seen people running pulp-manage-db while pulp services
>>>> are running.
>>>>
>>>>
>>>>> *2. *We can rethinking our approach to #2186 and perform the check
>>>>> against the process list
>>>>> Upgrade-wise implementing it this way is a lot easier for users, since
>>>>> they can do a straight upgrade to 2.12 without going through an
>>>>> intermediary release.
>>>>> The downside is that in clustered environments this would not catch
>>>>> every potential error. But #2186 is a best effort story, and if this is the
>>>>> best effort I am ok with it.
>>>>>
>>>>
>>>> This is easy and effective. I think we should do it, and
>>>> augment/replace it in the future with the worker checking.
>>>>
>>>>
>>>>
>>>>> Regardless of which option we go with I think we should get the pulp
>>>>> worker db cleanup in.
>>>>> We should also have a --ignore-running-worker flag [2] to prevent
>>>>> automated upgrade problems.
>>>>>
>>>>>
>>>>> [0] https://pulp.plan.io/issues/2186
>>>>> [1] https://github.com/werwty/pulp/commit/4f43a85dd568f4a0b5
>>>>> 0ae9b07bbec7138861e92b#diff-80e8b96df1f5da9a551bb6ff18dea352
>>>>> [2] https://pulp.plan.io/issues/2469
>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20161213/0f359cb3/attachment.htm>