[Linux-cluster] IP-based tie-breaker on a 2-node cluster?

gordan at bobich.net gordan at bobich.net
Thu Apr 17 17:09:10 UTC 2008


On Thu, 17 Apr 2008, Andrew Lacey wrote:

>> There's an argument that if your switch is down for 30 minutes, you
>> have bigger problems. If you have a 30 minute switch outage, the chances
>> are that you can live with the node power-up time on top of that.
>
> Point taken, but the problem is that if there is a switch outage and the
> nodes kill each other, then somebody has to come in, power the nodes back
> on and make sure everything comes up OK. It would be much easier if the
> nodes would just detect that the switch is down and wait patiently without
> doing anything (since there is really nothing wrong with the nodes at all,
> and if they just wait for the switch to come back, everything will be
> fine.)

How do you propose to differentiate between a network outage that should 
instigate fencing and one that shouldn't?

> We do have a history of flaky network here because we're a college...we
> have a lot of machines on campus that we don't control (student-owned) and
> we get weird traffic, rogue machines, etc. more frequently than a
> locked-down corporate environment. I want to make sure that one of those
> network events doesn't needlessly bring down our mail service, which is
> what will be running on this cluster.

The cross-over cluster interface without a switch would probably be 
the best solution. That coupled with a varying fencing timeout should do 
most of what you seem to want to achieve.

Gordan




More information about the Linux-cluster mailing list