[Linux-cluster] IP-based tie-breaker on a 2-node cluster?
gordan at bobich.net
gordan at bobich.net
Thu Apr 17 15:55:43 UTC 2008
On Thu, 17 Apr 2008, Andrew Lacey wrote:
> I am doing some testing on a 2-node, active/standby RHEL 4 cluster with
> non-GFS shared storage. I am using HP iLO for fencing. I don't have a
> quorum disk set up. Both cluster nodes are connected to the same switch,
> and that network path is used for cluster communication as well as general
> network communication (including access to iLO). I've found that when the
> switch goes down and comes back up, the result is not desirable. As soon
> as the switch loses power, each node starts trying to fence the other.
> Since the iLO is not reachable, this is unsuccessful, but the nodes keep
> retrying the fence. When the switch comes back online, the "OK Corral"
> scenario takes place -- both nodes fence each other simultaneously and
> bring down the cluster.
I had a similar issue, but the solution I went for is doctoring the
fencing agent to put in a delay based on node's priority in to the fencing
daemon. That way the nodes wouldn't try to fence simultaneously, but in a
staggered fashion.
If you have a spare NIC, and the nodes are next to each other, you could
make them use a cross-over cable for their cluster communication, so they
would notice that they are both still up even when the switch dies. That's
what I do.
Gordan
More information about the Linux-cluster
mailing list