[Linux-cluster] Graceful Degradation
gordan at bobich.net
gordan at bobich.net
Fri Dec 14 15:54:30 UTC 2007
Hi,
I've got most of my cluster pretty much sorted out, apart from kicking
nodes from the cluster when they fail.
Is there a way to make the node-kicking automated? I have 4 nodes. They
are sharing 2 GFS file systems, a root FS and a data FS. If I pull the
network cable from one of them, or just power it off, the rest of the
cluster nodes just stop. The only way to get them to start responding
again is to bring the missing node back, even if there are still enough
nodes to maintain quorum (3 nodes out of 4).
Can anyone suggest a way around this? How can I make the 3 remaining nodes
just kick the missing node out of the cluster and DLM group (possibly
after some timeout, e.g. 10 seconds) and resume operation until the node
rejoins?
This may or may not be related to the fact that I'm running a shared GFS
root, but any pointers would be welcome.
Thanks.
Gordan
More information about the Linux-cluster
mailing list