[Linux-cluster] Rgmanager fails to restart
Lon Hohberger
lhh at redhat.com
Fri Jul 6 16:41:48 UTC 2007
On Sun, Jul 01, 2007 at 02:17:48PM +0300, Janne Peltonen wrote:
> Hi!
>
> Sometimes, when I have cleanly shut down rgmanager on one node, and the
> services have nicely migrated to other nodes, trying to start rgmanager
> fails. Trying to access /dev/misc/dlm_rgmanager results in "No such
> device". clurgmgrd concludes that locks are not working and exits.
> (See strace output attached.)
That's really strange - it's almost like the DLM isn't responding to the
requests. The open of /dev/misc/dlm_rgmanager is performed by libdlm;
rgmanager is simply opening it. If I am not mistaken, the previous open
of /dev/misc/dlm-control followed by the write is basically saying
"yeah, that device exists". However, the device node isn't there, so we
go to open it and it fails.
> Trying to stop cman fails:
>
> --clip--
> [jmmpelto at pcn1 ~]$ sudo service cman restart
> Stopping cluster:
> Stopping fencing... done
> Stopping cman... failed
> /usr/sbin/cman_tool: Error leaving cluster: Device or resource busy
> [FAILED]
If something happened to the dlm while rgmanager was trying to use it, I
suspect there's a chance that it could keep something held (preventing
it from shutting down).
This sounds related to an open bugzilla where rgmanager is not cleaning
up a lockspace in all cases.
In a clean shutdown, rgmanager should always be cleaning up the
lockspace.
-- Lon
More information about the Linux-cluster
mailing list