[Linux-cluster] Rebooting qdisk master causes quorum to dissolve.

Mon Dec 21 11:09:24 UTC 2009

On Mon, 2009-12-21 at 13:26 +1000, Peter Tiggerdine wrote:
> Hi,
> 
> I have a five node cluster with a shared quorum disk without heuristics.
> Because of the a hardware problem I need to move the services off the
> host in question and replace some ram. The services moved without a
> hitch, but soon as I rebooted the nodes the cluster came down.
> 
> The relevant configuration is 
> 
> <cluster alias="Services" config_version="150" name="Services">
>         <quorumd interval="5" tko="12" device="/dev/emcpowere" votes="3"
> log_level="9" log_facility="local4" status_file="/qdisk_status"/>
>         <fence_daemon clean_start="1" post_fail_delay="15"
> post_join_delay="30"/>
>         <cman deadnode_timeout="90" expected_nodes="4"/> 

Try something like this:

<cman quorum_dev_poll="25000"/>
<totem token="25000"/>

<quorumd interval="2" tko="10" votes="2" label="One2Play-SAS-qdisk"
status_file="/tmp/qdisk" stop_cman="1"/>

I think your token timeout and cman quorum_dev_poll should be few
seconds bigger than interval * tko (which is in my case 2x10=20 secs,
and other values are 25 secs).

This means that one node will be fenced after 25 seconds.

-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
|                                                               |