[Pulp-list] Resource manager behaving differently between clusters

Dennis Kliban dkliban at redhat.com
Wed Jan 10 13:25:57 UTC 2018


It sounds like you may be experiencing issue https://pulp.plan.io/issues/
3135

>From our conversation on IRC, I learned that the hypervisor is acting up
and the VMs pause from time to time. So even though the system is not under
heavy load it still behaves as though it is. As a result the INactive
resource managers think that the active resource manager has become
inactive and start being active. What I am still not clear on is why more
than 1 resource manager is able to become active at a time. If this is
actually happening, then this is a new bug. You could avoid this problem by
only running 2 resource managers. Though it would be good to find a
reliable way to reproduce this problem and file a bug.

On Wed, Jan 10, 2018 at 6:37 AM, Sebastian Sonne <sebastian.sonne at noris.de>
wrote:

> Hello everyone.
>
> I have two pulp clusters, each containing three nodes, all systems are up
> to date (pulp 2.14.3). However, the cluster behavior differs greatly. Let's
> call the working cluster the external one, and the broken one internal.
>
> The setup: Everything is virtualized. Both clusters are distributed over
> two datacenters, but they're on different ESX-clusters. All nodes are
> allowed to migrate between hypervisors.
>
> On the external cluster, "celery status" gives me one resource manager, on
> the external cluster I get either two or three resource managers. As far as
> I understand, I can run the resource manager on all nodes, but should only
> see one in celery, because the other two nodes are going into standby.
>
> Running "ps fauxwww |grep resource_manage[r]" on the external cluster
> gives me four processes in the whole cluster. The currently active resource
> manager has two processes, the other ones have one process each. However,
> on the internal cluster I get six processes, two on each node.
>
> From my understanding, the external cluster works correctly, as the active
> resource manager has one process to communicate with celery, and one to do
> work, with the other two nodes only having one active process to
> communicate with celery and become active in case the currently active
> resource manager goes down.
>
> Oddly enough, celery seems to also disconnect it's own workers:
>
> "Jan 10 08:52:36 pulp02 pulp[101629]: celery.worker.consumer:INFO: missed
> heartbeat from reserved_resource_worker-1 at pulp02". As such, I think we
> can eliminate the network"
>
> I'm completely stumped and don't even have a real clue of what logs I
> could provide, or where to start looking into things.
>
> Grateful for any help,
> Sebastian
>
>
> Sebastian Sonne
> Systems & Applications (OSA)
> noris network AG
> Thomas-Mann-Strasse 16−20
> 90471 Nürnberg
> Deutschland
> Tel +49 911 9352 1184
> Fax +49 911 9352 100
>
> sebastian.sonne at noris.de
> https://www.noris.de - Mehr Leistung als Standard
> Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel, Jürgen Städing
> Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689
>
>
>
>
>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20180110/33affe7a/attachment.htm>


More information about the Pulp-list mailing list