[Linux-cluster] fence_gnbd failed

Tiago Cruz tiagocruz at forumgdh.net
Wed Jul 23 21:56:40 UTC 2008


Hello,

I have one machine (hotsite-bsb-la-1) exporting GNBD to two machines (hotsite-bsb-la-2 and "-3")

The cluster with RHEL 5.2 x86_64 and GFS was working very well, util I reboot the hotsite-bsb-la-2:

Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] CLM CONFIGURATION CHANGE 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] New Configuration: 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.30)  
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.33)  
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Left: 
Jul 23 18:56:38 hotsite-bsb-la-1 kernel: dlm: closing connection to node 2
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.31)  
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Joined: 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] CLM CONFIGURATION CHANGE 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] New Configuration: 
Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: hotsite-bsb-la-2.com not a cluster member after 0 sec post_fail_delay
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.30)  
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.33)  
Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com"
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Left: 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Joined: 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [SYNC ] This node is within the primary component and will provide service. 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [TOTEM] entering OPERATIONAL state. 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] got nodejoin message 10.65.13.30 
Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com" failed
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] got nodejoin message 10.65.13.33 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG  ] got joinlist message from node 1 
Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG  ] got joinlist message from node 3 
Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com.br"
Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com.br" failed
Jul 23 19:00:57 hotsite-bsb-la-1 last message repeated 50 times

Why fence was failing? Follow the cluster.conf:

<?xml version="1.0"?>
<cluster alias="hotsites" config_version="18" name="hotsites">
	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="hotsite-bsb-la-1.com" nodeid="1" votes="1">
		<fence/>
		</clusternode>
		<clusternode name="hotsite-bsb-la-2.com" nodeid="2" votes="1">
		<fence>
	           <method name="single">
	                <device name="gnbd" nodename="hotsite-bsb-la-2.com"/>
        	   </method>
		</fence>
		</clusternode>
		<clusternode name="hotsite-bsb-la-3.com" nodeid="3" votes="1">
		<fence>
	           <method name="single">
	                <device name="gnbd" nodename="hotsite-bsb-la-3.com"/>
        	   </method>
		</fence>
		</clusternode>
	</clusternodes>
	<cman/>
	<fencedevices>
		<fencedevice agent="fence_gnbd" name="hotsite" servers="hotsite-1.com"/>
	</fencedevices>
	<rm>
		<failoverdomains/>
		<resources>
			<clusterfs device="/dev/gnbd/hotsite" force_unmount="1" fsid="5666" fstype="gfs" mountpoint="/data" name="data" self_fence="1"/>
		</resources>
	</rm>
	<totem consensus="4800" join="60" token="10000" token_retransmits_before_loss_const="20"/>
</cluster>



# cman_tool status
Version: 6.1.0
Config Version: 18
Cluster Name: hotsites
Cluster Id: 27589
Cluster Member: Yes
Cluster Generation: 184
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 2
Quorum: 2  
Active subsystems: 8
Flags: Dirty 
Ports Bound: 0 177  
Node name: hotsite-bsb-la-1.com
Node ID: 1
Multicast addresses: 239.192.107.49 
Node addresses: 10.65.13.30 


Thanks

-- 
Tiago Cruz
http://everlinux.com
Linux User #282636





More information about the Linux-cluster mailing list