[Linux-cluster] fence_gnbd failed
Tiago Cruz
tiagocruz at forumgdh.net
Wed Jul 23 21:56:40 UTC 2008
Hello,
I have one machine (hotsite-bsb-la-1) exporting GNBD to two machines (hotsite-bsb-la-2 and "-3")
The cluster with RHEL 5.2 x86_64 and GFS was working very well, util I reboot the hotsite-bsb-la-2:
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] CLM CONFIGURATION CHANGE
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] New Configuration:
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.30)
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.33)
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Left:
Jul 23 18:56:38 hotsite-bsb-la-1 kernel: dlm: closing connection to node 2
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.31)
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Joined:
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] CLM CONFIGURATION CHANGE
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] New Configuration:
Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: hotsite-bsb-la-2.com not a cluster member after 0 sec post_fail_delay
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.30)
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.33)
Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com"
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Left:
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Joined:
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [SYNC ] This node is within the primary component and will provide service.
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [TOTEM] entering OPERATIONAL state.
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] got nodejoin message 10.65.13.30
Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com" failed
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] got nodejoin message 10.65.13.33
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG ] got joinlist message from node 1
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG ] got joinlist message from node 3
Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com.br"
Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com.br" failed
Jul 23 19:00:57 hotsite-bsb-la-1 last message repeated 50 times
Why fence was failing? Follow the cluster.conf:
<?xml version="1.0"?>
<cluster alias="hotsites" config_version="18" name="hotsites">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="hotsite-bsb-la-1.com" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="hotsite-bsb-la-2.com" nodeid="2" votes="1">
<fence>
<method name="single">
<device name="gnbd" nodename="hotsite-bsb-la-2.com"/>
</method>
</fence>
</clusternode>
<clusternode name="hotsite-bsb-la-3.com" nodeid="3" votes="1">
<fence>
<method name="single">
<device name="gnbd" nodename="hotsite-bsb-la-3.com"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_gnbd" name="hotsite" servers="hotsite-1.com"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources>
<clusterfs device="/dev/gnbd/hotsite" force_unmount="1" fsid="5666" fstype="gfs" mountpoint="/data" name="data" self_fence="1"/>
</resources>
</rm>
<totem consensus="4800" join="60" token="10000" token_retransmits_before_loss_const="20"/>
</cluster>
# cman_tool status
Version: 6.1.0
Config Version: 18
Cluster Name: hotsites
Cluster Id: 27589
Cluster Member: Yes
Cluster Generation: 184
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Quorum: 2
Active subsystems: 8
Flags: Dirty
Ports Bound: 0 177
Node name: hotsite-bsb-la-1.com
Node ID: 1
Multicast addresses: 239.192.107.49
Node addresses: 10.65.13.30
Thanks
--
Tiago Cruz
http://everlinux.com
Linux User #282636
More information about the Linux-cluster
mailing list