[Linux-cluster] node fails to stop when inquorate

Wed Oct 18 19:38:13 UTC 2006

Hello.

I've been seeing some strange behavior on a failed node that perhaps
some of the forum members have encountered.

A 2-node cluster with qdiskd running. Disconnecting one node from the
network causes it to be "fabric fenced", and the remaining node
continues working as expected.
When trying to restart the failed node, rgmanager's script sends it (the
rgmanager process) into zombie land, which makes the script loop forever.
The (ugly) workaround I've been using is killing the process manually
and then manually removing /var/lock/subsys/rgmanager, which causes "rc"
to skip it.

Is there a better way to restart a failed node? Shouldn't a failed node
be "hard booted" by cman?

Thanks,
-- 
Katriel Traum, PenguinIT
RHCE, CLP
Mobile: 054-6789953