[Linux-cluster] Graceful recover after connectivity failure

Patrick Caulfeld pcaulfie at redhat.com
Mon Jan 14 09:03:40 UTC 2008


Cliff Hones wrote:
> I am using Centos5.1 with GNBD and GNBD fencing.
> 
> Following the failure of a cluster member - eg a temporary
> loss of connectivity - which results in the node being
> fenced, is there a clean way to re-join the cluster without
> having to reboot the affected node?

Basically, no.

If a node is apart from the cluster for any period of time, it can't
tell whether the state of that cluster has changed while it was
disconnected. So it must be fenced and restart the cluster software from
the beginning to rebuild it's state from scratch.


> I am finding that it is impossible to shut down or restart the
> cluster components on the affected node, and even trying to force
> a reboot from a ssh session just hangs.
> 
> There seems to be a chicken-and-egg situation - a gfs filesystem
> cannot be unmounted if the node is fenced, and cman/clvmd cannot
> be stopped/restarted if a filesystem is mounted.   Forcibly
> trying to kill the cluster processes also fails.
> 

Patrick




More information about the Linux-cluster mailing list