[Linux-cluster] RHCS - concept and understanding of fencing

Thu Feb 16 10:51:20 UTC 2012

Hi guys

First sorry
if something is unclear, English isn’t my natural language but I still
improving it.
It seems
to be that I’ve a problem in understanding the fencing concept (RHCS 6.2) or I
do something wrong in the configuration. Let me explain the details.

I want
to setup a 2 node Cluster for a simple active/passive Scenario with SAN-Storage
and Quorum Device.

According
to my understanding
For a 2
Node Cluster it’s necessary to set "cman two_node="1"
expected_votes="1". 
So every
System has one vote and but one must be available to be quorate.

What is
if I want to have an extra Qurom-Device, provided by SAN Storage. 
In this setup
"two_node=1" is not necessary and expected_votes=1 must change to 2. 
node1+node2+quorum
= 3.  So if node 2 goes down we have 2
votes (node1+quorum disk) and the cluster is still quorate and can still provide
Services. On the other site the quorum disk is available at both system – so every
system has 2 votes in sum 4, that confused me a little bit.

<cluster>
..
<cman expected_votes="3" />
<quorumd label="qdisk" />
..
</cluster>

Q: In
which situation will node1<->node2 fencing them.
·         If
the hearbeat-LAN disconnect?
·         If
Qurom-Disk went offline?
·         If
both goes offline?

In my
understanding of the concept one of both can be down and a node will be fenced
if both goes offline.

Okay –
this was the theory and my understanding of it. 
In
praxis my nodes shoot themselves in the head.

Network-Setup
VLAN10 -
192.168.8.0/24 - Backup 
VLAN11 -
192.168.100.0/23 - Production 
VLAN12 -
192.168.1.0/24 - Cluster / Heartbeat only
VLAN13 -
192.168.10.0/24 – iLO Management 

System-Setup
ha-node1
(rhel 6.2)
eth1 - 192.168.8.1
eth2 -
192.168.100.1
eth3 -
192.168.1.1
hp-ilo
192.168.10.1

ha-node2
(rhel 6.2)
eth1 - 192.168.8.2
eth2 -
192.168.100.2
eth3 –
192.168.1.3
hp-ilo
192.168.10.2

/etc/cluster/cluster.conf 
(http://pastebin.com/nWhnbk73)

<?xml
version="1.0"?>
<cluster
config_version="4" name="argus">
        <fence_daemon/>
        <clusternodes>
                <clusternode
name="ha-node1" nodeid="1">
                        <fence>
                                <method
name="ipmi">
                                        <device action="reboot" name="impifence1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode
name="ha-node2" nodeid="2">
                        <fence>
                                <method
name="ipmi">
                                        <device
action="reboot" name="impifence2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="3"
/>
        <fencedevices>
                <fencedevice
agent="fence_ipmilan" auth="password"
ipaddr="192.168.10.1" lanplus="1"
login="rhcsfencing" name="impifence1" passwd="*********"/>
                <fencedevice
agent="fence_ipmilan" auth="password"
ipaddr="192.168.10.2" lanplus="1"
login="rhcsfencing" name="impifence2"
passwd="*********"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources>
                        <ip
address="192.168.100.10" monitor_link="on"
sleeptime="10"/>
                        <ip
address="192.168.8.49" monitor_link="on"
sleeptime="10"/>
                        <fs
device="/dev/mapper/vg_dataP01-lv_argus" force_fsck="yes"
fstype="ext4" mountpoint="/opt/argus"
name="fsargus"/>
                        <fs
device="/dev/mapper/vg_dataP01-lv_mysql" force_fsck="yes"
fstype="ext4" mountpoint="/var/lib/mysql"
name="fsmysql"/>
                </resources>
                <service
autostart="1" name="argus_prod"
recovery="restart">
                        <ip
ref="192.168.100.10">
                                <fs
ref="fsmysql">
                                        <script file="/etc/init.d/mysqld"
name="mysqld"/>
                                </fs>
                        </ip>
                </service>
                <service
autostart="1" name="argus_backup"
recovery="restart">
                        <ip
ref="192.168.8.49"/>
                </service>
        </rm>
        <quorumd label="qdisk" />
</cluster>

/var/log/messages
http://pastebin.com/5G7n56bU

Thx
Chris