[Linux-cluster] node fails to stop when inquorate
Lon Hohberger
lhh at redhat.com
Wed Oct 18 20:20:06 UTC 2006
On Wed, 2006-10-18 at 21:38 +0200, Katriel Traum wrote:
> The (ugly) workaround I've been using is killing the process manually
> and then manually removing /var/lock/subsys/rgmanager, which causes "rc"
> to skip it.
> Is there a better way to restart a failed node? Shouldn't a failed node
> be "hard booted" by cman?
Nodes don't "know" they're fenced with fabric-level fencing; it's a
deficiency in the model itself.
The easiest thing to do is 'reboot -fn'. A fenced node may have
outstanding buffers which never get cleaned up - so you can't "un-fence"
them until they have been rebooted anyway.
Rgmanager's child processes are probably trying to umount the a file
system that has been fenced and are stuck in disk-wait - which may be
"forever", depending on the storage configuration.
There's an patch outstanding for qdiskd which makes it reboot on loss of
score, which triggers a reboot. However, I don't think this is your
problem.
-- Lon
More information about the Linux-cluster
mailing list