[Linux-cluster] Using ping as a heuristic not a great idea?

Don Hoover dxh at yahoo.com
Fri May 21 15:08:27 UTC 2010


Well, we seem to have found something that works even when running the heartbeats across the production network.

Using this we were able to maintain heartbeats as well as have the quorum ping heuristic work even when we flooded the interface on the node.


Instead of the often cited "ping -c1 t1 <router>", we used "ping -c3 -t3 -W1", as well as upping the heuristic interval to 10, and keeping the quorum interval at 5 it worked during our testing of flooding the interface to get it to fail quorum.   With the standard c1 t1 it was easy to get quorum to fail just by doing a couple of big ftp jobs on the node.  We had the interface pegged at pretty much the theoretical max throughput and were still able to keep getting the ping heuristic to work with these settings.


Here is an example showing what we used and finally settled on:

<quorumd interval="5" label="rhcsqdisk" min_score="1" tko="10" votes="1">
     <heuristic interval="10" program="/bin/ping -c3 -t3 -W1 192.168.0.1" score="1"/>
</quorumd>


Note, since we bumped up our quorumd interval from 2 to 5 (seconds between writes to qdisk), we had to up the totem token to "100000" (100 seconds = 2 x qdisk interval x qdisk tko).




More information about the Linux-cluster mailing list