[Linux-cluster] 2-node tie-breaking

Thu Feb 7 19:35:37 UTC 2008

On Thu, 2008-02-07 at 15:32 +0000, gordan at bobich.net wrote:
> Hi,
> 
> I've got a slightly peculiar problem. 2-node cluster acting as a load 
> balanced fail-over router. 3 NICs: public, private, cluster.
> Cluster NICs are connected with a cross-over cable, the other two are on 
> switches. The cluster NIC is only used for DRBD/GFS/DLM and associated 
> things.
> 
> The failure mode that I'm trying to account for is the one of the cluster 
> NIC failing on one machine. On the public and privace networks, both 
> machines can still see everything (including each other). That means that 
> a tie-breaker based on other visible things will not work.
> 
> So, which machine gets fenced in the case of the cluster NIC failure (or 
> more likely, if the x-over cable falls out?

... whichever gets fenced first ;)

1. You can do a clever heuristic using qdiskd if you wanted, for example:

 * assign an IP on the private cluster network and make rgmanager manage
it as a service (even though it doesn't do anything). Make sure to
*disable* monitor_link, or rgmanager will stop the service!

 * make a script to check for:
   * ethernet link of the private interface, and
   * if that fails, ping the service IP address
     * if that fails, we are *dead*; give up and
       -do not- try to fence

If you put the IP as part of the "most critical" service that
rgmanager's running, then the operator of that service will continue
running while the other node is not allowed to continue running.

Because the first check is whether we have a cluster link - and the
*second* check is the ping of the service, 

Something like this...

<quorumd ...>
  <heuristic program="/usr/local/sbin/private-link-script" ... />
</quorumd>

<rm>
  ...
  <service name="pinger-ip" >
    <ip address="10.1.1.2" monitor_link="no"/>
  </service>
  ...
</rm>

Script might something like:

#!/bin/sh

DEVICE=eth3
PINGIP=10.1.1.2

#
# Ensure the device is there!
#
ethtool $DEVICE || exit 1

#
# Check for link
#
ethtool $DEVICE | grep -q "Link detected.*yes" 
if [ $? -eq 0 ]; then
  exit 0
fi

#
# XXX Work around signal bug for now.
#
ping_func()
{
        declare retries=0
        declare PID

        ping -c3 -t2 $1 &
        PID=`jobs -p`
        while [ $retries -lt 2 ]; do
                sleep 1
                ((retries++))

                kill -n 0 $PID &> /dev/null
                if [ $? -eq 1 ]; then
                        wait $PID
                        return $?
                fi
        done

        kill -9 $PID
        return 1
}

#
# Ping service ip address.
#
ping_func $1
exit $?
-------------------------------

Disadvantage is that it's hard to start the cluster w/o both nodes
online without some sort of override.

2.  You can do something like Brian said, too - e.g. "if I am the right
host and the link isn't up, I win":

#!/bin/sh

DEVICE=eth3
OTHER_NODE_PUBLIC_IP="192.168.1.2"

#
# Ensure the device is there!
#
ethtool $DEVICE || exit 1

#
# Check for link
#
ethtool $DEVICE | grep -q "Link detected.*yes" 
if [ $? -eq 0 ]; then
  exit 0
fi

#
# XXX Work around signal bug for now.
#
ping_func()
{
        declare retries=0
        declare PID

        ping -c3 -t2 $1 &
        PID=`jobs -p`
        while [ $retries -lt 2 ]; do
                sleep 1
                ((retries++))

                kill -n 0 $PID &> /dev/null
                if [ $? -eq 1 ]; then
                        wait $PID
                        return $?
                fi
        done

        kill -9 $PID
        return 1
}

#
# Ok, no link on private net
#
ping_func $OTHER_NODE_PUBLIC_IP
if [ $? -eq 0 ]; then
  [ "`uname -n`" == "node1" ]
  exit $?
fi

#
# Other node is down and we're not - 
# we win
#
exit 0
-------------------------------

3.  Another simple way to do it is to use a fake "fencing agent" to
introduce a delay:

   <fencedevice agent="/bin/sleep-10" name="sleeper" .../>

(where /bin/sleep-10 is something like:
#!/bin/sh
sleep 10
exit 0
)

Reference that agent as part of -one- node's fencing, and that node will
lose by default.  This way, you don't have to set up qdiskd.  You could
do the same thing by just editing the fencing agent directly on that
node, as well - in which case, you wouldn't have to edit cluster.conf at
all.

-- Lon