[Linux-cluster] Help with a two node cluster for a web server needed

Mon Jan 14 20:03:46 UTC 2008

So, what was happening was this:

1. unplug cable
2. cman transitions
3. fencing occurs
4. qdiskd detects negative transition

Here's what we want on the "dead" node:

1. unplug cable
2. qdiskd detects negative transition from heuristic

Here's your configuration:

    <quorumd interval="1" label="Qdisk1" tko="5" votes="1">
        <heuristic interval="1" program="ping 10.200.10.1 -c1 -t1"
score="1" tko="3"/>
    </quorumd>

First, let's ping the router with the cable unplugged to see how long it
takes for our heuristic to complete when things are "broken".  On my
machine:

[lhh at ayanami ~]$ time ping -c1 -t1 frederick
PING frederick (12.1.2.99) 56(84) bytes of data.
>From ayanami (12.1.2.37) icmp_seq=1 Destination Host Unreachable

--- frederick ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

real    0m3.006s
^^^^^^^^^^^^^^^^
user    0m0.000s
sys     0m0.000s

Ok - so, 3 seconds for ping to "not find" a host if routing is wrong or
the host is down, sleep 1 second, repeat 3x (tko!) - if dead 3x (tko
count), qdisk removes the vote from CMAN.  That means if the host is
down, it will take qdisk about 3 * (3+1) = 12 seconds to kill its vote
with CMAN. [NOTE: keep in mind, it might not be 3 seconds for your
configuration...]

CMAN's default failover time is 5 seconds (this is really openais's
Totem protocol token timeout, if you want to be technical).

12 > 5, meaning qdiskd can't do much to help before CMAN takes action.
We need to flip these times so that CMAN times out *after* qdisk.  This
way, qdiskd can say "Ok! I'm dead!" - and either take action (reboot by
default) or remove its vote from CMAN.

So, the practical rules for timings are basically like this:

* Heuristics should transition before QDisk.  x < y.

* Qdisk should transition before CMAN - in a little less than 1/2 the
time, actually.  y * 2 < z

Option 1:

Make 1 tko sufficient by making the heuristic do more work.  In my quick
testing, the same 3 seconds for 1 packet was used for 3 packets.

Also, we still want CMAN to time out after qdisk - which it won't yet.
So, we need to add a tag to cluster.conf that instructs totem to report
a node as down after a period longer than qdisk (a little more than
double, as noted above):

    ...
    <quorumd ...>
        <heuristic interval="1" program="ping 10.200.10.1 -c3 -t1"
score="1" tko="1"/>
    </quorumd>
    <totem token="11000"/>
    ...

This says 110000 milliseconds, or 11 seconds, is required before totem
(and therefore, CMAN) will declare a node dead (2 * qdisk_timeout) = 10.
Toss in a second for fun, we get 11 seconds.

Since the ping timeout for -c3 is 3 seconds and we have a tko of 1, it
should take 3-4 seconds for ping to return a failure.

3 < 5
5 * 2 < 11

Option 2: 

Make things fit around your heuristic.  Given our 12 second "negative"
case for our heuristic/tko, we can simply make qdisk time out in >12
seconds.  Then, we double that and add a bit for CMAN:

    ...
    <quorumd interval="1" label="Qdisk1" tko="13" votes="1">
        <heuristic interval="1" program="ping 10.200.10.1 -c1 -t1"
score="1" tko="3"/>
    </quorumd>
    <totem token="27000"/>
    ...

12 < 13 
13 * 2 < 27

Let me know if this helps you, so I can add it to the Wiki and further
clarify the manual pages.  Either of these should get you up and
working.

-- Lon