[Linux-cluster] Graceful Degradation

gordan at bobich.net gordan at bobich.net
Fri Dec 14 15:54:30 UTC 2007


Hi,

I've got most of my cluster pretty much sorted out, apart from kicking 
nodes from the cluster when they fail.

Is there a way to make the node-kicking automated? I have 4 nodes. They 
are sharing 2 GFS file systems, a root FS and a data FS. If I pull the 
network cable from one of them, or just power it off, the rest of the 
cluster nodes just stop. The only way to get them to start responding 
again is to bring the missing node back, even if there are still enough 
nodes to maintain quorum (3 nodes out of 4).

Can anyone suggest a way around this? How can I make the 3 remaining nodes 
just kick the missing node out of the cluster and DLM group (possibly 
after some timeout, e.g. 10 seconds) and resume operation until the node 
rejoins?

This may or may not be related to the fact that I'm running a shared GFS 
root, but any pointers would be welcome.

Thanks.

Gordan




More information about the Linux-cluster mailing list