[Linux-cluster] rgmanager gets stuck on shutdown, if no services are running on its node.

Jankowski, Chris Chris.Jankowski at hp.com
Thu Dec 9 06:58:41 UTC 2010


Lon,

I think that I got to the bottom of the problem:

If there are *no* services running on a node and you issue "shutdown -h now" on the node, then when it comes to shutting down rgmanger, it executes the following sequence:

1. Outputs "Shutting down" message to /var/adm/messages
2. Waits for the "status_poll_interval" value of seconds
3. Outputs the message: "Shutdown complete, exiting" and completes its own shutdown.

In my case, I had <rm status_poll_interval="3600"/>, as my service scripts do not have a viable check of their status, and the status check messages were clogging up the /var/adm/messages file.  So, rgmanager appeared to be stuck, whereas it was just really waiting.

I think this is a bug in logic here.  It should not be waiting in this situation.

------------
By comparison, if there is a service running on a node and you issue "shutdown -h now" on the node, then when it comes to shutting down rgmanger, it executes the following sequence:

1. Outputs "Shutting down" message to /var/adm/messages
2. Proceeds *immediately* (no wait) to shutting down the service
3. When the service is shutdown the rgmanager *immediately* outputs "Shutdown complete, exiting" and completes its own shutdown.

-------------
As a workaround, I set status_poll_interval="10" for the time being, although I believe that I should be forced to rely on short polling interval.

Regards,

Chris Jankowski

-----Original Message-----
From: Jankowski, Chris 
Sent: Thursday, 9 December 2010 16:08
To: linux clustering
Subject: RE: [Linux-cluster] rgmanager gets stuck on shutdown, if no services are running on its node.

Lon,

The problem is reproducible at will. I do have access to the system after the "shutdown -h now" command is issued and rgmanager blocks.

I have gdb installed, but I do not know how to obtain rgmanager-debuginfo. The system is on an isolated network and I pointed you to an on-disk repository that is a copy of the RHEL6 distribution DVD copied to local disk.

Thanks and regards,

Chris

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Lon Hohberger
Sent: Thursday, 9 December 2010 06:46
To: linux clustering
Subject: Re: [Linux-cluster] rgmanager gets stuck on shutdown, if no services are running on its node.

On Wed, 2010-12-08 at 03:11 +0000, Jankowski, Chris wrote:
> Hi,
>  
> I configured a cluster of 2 RHEL6 nodes.  
> The cluster has only one HA service defined.
>  
> I have a problem with rgmanager getting stuck on shutdown when certain
> set of conditions are met.  The details follow.
>  
> 1.
> If I execute “shutdown –h now” on the node that is *not* running the
> HA service then the shutdown process gets stuck with the last message
> in the /var/log/messages being:
>  

Is this reproducible outside of 'shutdown -h now', ex: does 'service
rgmanager stop' work in your configuration?

If you can still reach the machine (ssh or whatever) after executing
'shutdown -h now':

1) Install 'rgmanager-debuginfo' and gdb.

2) When rgmanager hangs on shutdown, run:

  - gdb /usr/sbin/rgmanager `pidof -s rgmanager`

3) When inside gdb, run:

  - thr a a bt

There's a related bug in RHEL5 related to releasing the lockspace if
CMAN exits before rgmanager, but I was unable to reproduce it on the
STABLE3/31 branches when I tested.

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list