[Linux-cluster] 2 node cluster gets fenced despite qdisk

Tue Aug 26 16:26:47 UTC 2008

Hi all!

I'm trying to get a 2 node cluster w/ quorum device running, most of its running
fine. 
The cluster has a public net interface (bond0) and a private one (bond1).
When the clusterinterconnect gets lost (ifconfig down the underlying eth devs), 
the two nodes immediatly fence each other and the cluster goes down. Is this some
sort of expected behavior?

I assumed the master node (qdiskd) {w,c,sh}ould stay alive and provide services,
as the cluster still has one communication channel (the quorum disk).

Below is the cluster.conf, OS used is RHEL5.2 with the latest patches.

Or is the only use of the quorum disk not to get a split brain condition?

If one could point me to a good resource for RHCS cluster configuration (eg
comprehensive explaination of cluster.conf options) I would much apreciate this.

kind regards
Gerhard

[root at ols011p yum.repos.d]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="fusi01" config_version="12" name="fusi01">
        <fence_daemon post_fail_delay="0" post_join_delay="10"/>
        <cman expected_votes="3" two_node="0"/>
        <clusternodes>
                <clusternode name="ols011p" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="1" name="OLS011-m"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="ols012p" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device lanplus="1" name="OLS012-m"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <quorumd device="/dev/mapper/HDS-00F9p1" votes="1">

            <heuristic interval="2" tko="3" program="ping -c1 -t3 172.27.111.254" score="1"/>

        </quorumd>

        <fencedevices>
                <fencedevice agent="fence_ipmilan" option="off" auth="password" ipaddr="ols011-m" login="root" name="OLS011-m" passwd="changeme"/>
                <fencedevice agent="fence_ipmilan" option="off" auth="password" ipaddr="ols012-m" login="root" name="OLS012-m" passwd="changeme"/>
        </fencedevices>

        <rm>
                <failoverdomains>
                        <failoverdomain name="fusi01_hvm_dom" nofailback="0" ordered="1" restricted="1">
                                <failoverdomainnode name="ols011p" priority="2"/>
                                <failoverdomainnode name="ols012p" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="fusi01_pvm_dom" nofailback="0" ordered="1" restricted="1">
                                <failoverdomainnode name="ols011p" priority="1"/>
                                <failoverdomainnode name="ols012p" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
                <vm autostart="1" domain="fusi01_pvm_dom" exclusive="0" migrate="live" name="fusi01pvm" path="/global/xenconfig" recovery="restart"/>
                <vm autostart="1" domain="fusi01_hvm_dom" exclusive="0" migrate="live" name="fusi01hvm" path="/global/xenconfig" recovery="restart"/>
        </rm>
</cluster>