[Linux-cluster] Rgmanager fails to restart

Lon Hohberger lhh at redhat.com
Fri Jul 6 16:41:48 UTC 2007


On Sun, Jul 01, 2007 at 02:17:48PM +0300, Janne Peltonen wrote:
> Hi!
> 
> Sometimes, when I have cleanly shut down rgmanager on one node, and the
> services have nicely migrated to other nodes, trying to start rgmanager
> fails. Trying to access /dev/misc/dlm_rgmanager results in "No such
> device". clurgmgrd concludes that locks are not working and exits.
> (See strace output attached.)

That's really strange - it's almost like the DLM isn't responding to the
requests.  The open of /dev/misc/dlm_rgmanager is performed by libdlm;
rgmanager is simply opening it.  If I am not mistaken, the previous open
of /dev/misc/dlm-control followed by the write is basically saying
"yeah, that device exists".  However, the device node isn't there, so we
go to open it and it fails.

> Trying to stop cman fails:
> 
> --clip--
> [jmmpelto at pcn1 ~]$ sudo service cman restart
> Stopping cluster: 
>    Stopping fencing... done
>    Stopping cman... failed
> /usr/sbin/cman_tool: Error leaving cluster: Device or resource busy
>                                                            [FAILED]

If something happened to the dlm while rgmanager was trying to use it, I
suspect there's a chance that it could keep something held (preventing
it from shutting down).

This sounds related to an open bugzilla where rgmanager is not cleaning
up a lockspace in all cases.

In a clean shutdown, rgmanager should always be cleaning up the
lockspace.

-- Lon




More information about the Linux-cluster mailing list