[Linux-cluster] Heuristics for quorum disk used as a tiebreaker in a two node cluster.

Wed Dec 8 20:33:14 UTC 2010

On Fri, 2010-12-03 at 10:10 +0000, Jankowski, Chris wrote:

> This is exactly what I would like to achieve.  I know which node
> should stay alive - the one running my service, and it is trivial for
> me to find this out directly, as I can query for its status locally on
> a node. I do not have use the network. This can be used as a heuristic
> for the quorum disc.
>  
> What I am missing is how to make that into a workable whole.
> Specifically the following aspects are of concern:
>  
> 1.
> I do not want the other node to be ejected from the cluster just
> because it does not run the service.  But the test is binary, so it
> looks like it will be ejected.

When a two node cluster partitions, someone has to die.

> 2.
> Startup time, before the service started.  As no node has the service,
> both will be candidates for ejection.

One node will die and the other will start the service.

> 3.
> Service migration time.
> During service migration from one node to another, there is a
> transient period of time when the service is not active on either
> node.

If you partition during a 'relocation' operation, rgmanager will
evaluate the service and start it after fencing completes.

> 1.
> How do I put all of this together to achieve the overall objective of
> the node with the service surviving the partitioning event
> uninterrupted?

As it turns out, using qdiskd to do this is not the easiest thing in the
world.  This has to do with a variety of factors, but the biggest is
that qdiskd has to make choices -before- CMAN/corosync do, so it's hard
to ensure correct behavior in this particular case.

The simplest thing I know of to do this is to selectively delay fencing.
It's a bit of a hack (though less so than using qdiskd, as it turns
out).

NOTE: This agent _MUST_ be used in conjunction with a real fencing
agent.  Put the reference to the agent before the real fencing agent
within the same method.

It might look like this:

#!/bin/sh

me=$(hostname)
service=empty1

owner=$(clustat -lfs $service | grep '^  Owner' | cut -f2 -d: ; exit
${PIPESTATUS[0]})
state=$?

echo Eval $service state $state $owner

if [ $state -eq 0 ] && [ "$owner" != "$me" ]; then
        echo Not the owner - Delaying 30 seconds
        sleep 30
fi

exit 0

What it does is give preference to the node running the service by
making the non-owner delay a bit before trying to perform real fencing
operation.  If the real owner is alive, it will fence first.  If the
service was not running before the partition, no node gets preference.

If the primary driving reason for using qdiskd was to solve this
problem, then you can you can avoid using qdiskd.

> 2.
> What is the relationship between  fencing and node suicide due to
> communication through quorum disk?

None.  Both occur.

> 3.
> How does the master election relate to this?

It doesn't, really.  To get a node to drop master, you have to turn
'reboot' off.  After 'reboot' is off, a node will abdicate 'master' mode
if its score drops.

-- Lon