[Linux-cluster] IP-based tie-breaker on a 2-node cluster?

Thu Apr 17 18:25:26 UTC 2008

Andrew Lacey wrote:

> Very informative post...thanks! The scenario you mentioned with a dead
> switch port (or a single unplugged network cable, or whatever) is
> something I had thought about, and I considered it to be a strike against
> using a crossover cable.

How does that follow? With a switch in the middle your points of failure 
are:
cable, switch, cable

With just a crossover cable (actually, it doesn't have to be crossover - 
99% of NICs made in the past few years auto-detect and auto-negotiate 
whether they need to cross-over or not, so you can just use a 
straight-through cable - but that's getting off topic), you only have a 
single cable as a point of failure. That is certainly better than the 
alternative.

> But, this "monitor_link" sounds like it might be
> exactly what I've been looking for. I'll research that and see what I can
> find.

You don't need that on your cluster interface though. If the NIC or 
cable die, cluster will lose the connection to the other node and fence 
it. If you have something like iLO on multiple interfaces, you can 
specify multiple fencing devices, to ensure that you manage to fence the 
other node, regardless of which interface fails. But the crossover 
interface connecting the nodes is arguably the most reliable part of 
your 2-node cluster because it has the fewest components.

> You asked in your other post how I can tell the difference between a
> network outage that should cause a fence and one that shouldn't. What I
> wanted to do was set it up so that a node that can't reach the switch will
> never try to fence the other node. That way, if the switch is down and
> nobody can reach it, then nobody will fence. If there is a single port
> failure and one node can still reach the switch, then it will fence the
> other node and take over the services.

Is your switch managed? If so, you can use this as a fencing device 
simply have a node disable the other node's port. That way any 
subsequent attempts by the other node, to fence or do anything else, 
will not get anywhere. You may need to write your own fencing agent for 
that, though. I asked for fencing agent API in a post earlier, and there 
appears to be no conclusive documentation for this. I've been meaning to 
implement a fencing agent for exactly this sort of thing (fencing by 
disabling the switch port) on a 3Com switch.

Gordan