[Linux-cluster] RHCS not fence 2nd node in 2 nodes cluster

Wang2, Colin (NSN - CN/Cheng Du) colin.wang2 at nsn.com
Thu Nov 12 08:19:30 UTC 2009


Hi All,

FYI. 
  The issue was resolved with help from Redhat support. 
  If use ip but name of cluster node,  it will not fence node that has
lowest id.

Resolution,
  Put one mapping in /etc/hosts.
  Use name but ip  for cluster node.


BRs,
Colin


-----Original Message-----
From: ext Wang2, Colin (NSN - CN/Cheng Du) <colin.wang2 at nsn.com>
Reply-To: linux clustering <linux-cluster at redhat.com>
To: linux-cluster at redhat.com
Subject: [Linux-cluster] RHCS not fence 2nd node in 2 nodes cluster
Date: Wed, 04 Nov 2009 15:40:34 +0800


Hi Gurus,

I am working on setup 2 nodes cluster, and environment is,

Hardware,
   IBM BladeCenter with 2 LS42( AMD Opteron Quad Code 2356 CPU, 16GB
Memory).
   Storage, EMC CX3-20f
   Storage Switch: Brocade 4GB 20 ports switch in IBM bladecenter.
   Network Switch: Cisco Switch module in IBM Bladecenter.
Software,
   Redhat EL 5.3 x86_64, 2.6.18-128.el5
   Redhat Cluster Suite 5.3.

This is 2 nodes cluster, and my problem is that,
  - When poweroff 1st node with command "halt -fp", 2nd node can fence
1st node and take over services.
  - When poweroff 2nd node with command "halt -fp", 1st node can't fence
2nd node and can't take over services.


fence_tool dump contents,
----for successful test
dump read: Success
1257305495 our_nodeid 2 our_name 198.18.9.34
1257305495 listen 4 member 5 groupd 7
1257305511 client 3: join default
1257305511 delay post_join 3s post_fail 0s
1257305511 clean start, skipping initial nodes
1257305511 setid default 65538
1257305511 start default 1 members 1 2 
1257305511 do_recovery stop 0 start 1 finish 0
1257305511 first complete list empty warning
1257305511 finish default 1
1257305611 stop default
1257305611 start default 3 members 2 
1257305611 do_recovery stop 1 start 3 finish 1
1257305611 add node 1 to list 1
1257305611 node "198.18.9.33" not a cman member, cn 1
1257305611 node "198.18.9.33" has not been fenced
1257305611 fencing node 198.18.9.33
1257305615 finish default 3
1257305658 client 3: dump
----For failed test
dump read: Success
1257300282 our_nodeid 1 our_name 198.18.9.33
1257300282 listen 4 member 5 groupd 7
1257300297 client 3: join default
1257300297 delay post_join 3s post_fail 0s
1257300297 clean start, skipping initial nodes
1257300297 setid default 65538
1257300297 start default 1 members 1 2 
1257300297 do_recovery stop 0 start 1 finish 0
1257300297 first complete list empty warning
1257300297 finish default 1
1257303721 stop default
1257303721 start default 3 members 1 
1257303721 do_recovery stop 1 start 3 finish 1
1257303721 add node 2 to list 1
1257303721 averting fence of node 198.18.9.34
1257303721 finish default 3
1257303759 client 3: dump

I think it was caused by "averting fence of node 198.18.9.34", but why
it advert fence? Could you help me out? Thanks in advance.

This cluster.conf for reference.
<?xml version="1.0"?>
<cluster config_version="14" name="x">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
        <clusternode name="198.18.9.33" nodeid="1" votes="1">
                <fence>
                        <method name="1">
                                <device blade="13" name="mm1"/>
                        </method>
                </fence>
        </clusternode>
        <clusternode name="198.18.9.34" nodeid="2" votes="1">
                <fence>
                        <method name="1">
                                <device blade="14" name="mm1"/>
                        </method>
                </fence>
        </clusternode>
</clusternodes>

<quorumd device="/dev/sdb1" interval="2" tko="7" votes="1">
        <heuristic interval="3" program="ping 198.18.9.61 -c1 -t2"
score="10"/>
</quorumd>

<totem token="27000"/>
<cman expected_votes="3" two_node="0" quorum_dev_poll="23000">
        <multicast addr="239.192.148.6"/>
</cman>


<fencedevices>
        <fencedevice agent="fence_bladecenter_ssh" ipaddr="x" login="x"
name="mm1" passwd="x"/>
</fencedevices>



BRs,
Colin

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20091112/9a5ab1a8/attachment.htm>


More information about the Linux-cluster mailing list