[Linux-cluster] fence_gnbd failed

Benjamin Marzinski bmarzins at redhat.com
Thu Jul 31 17:35:42 UTC 2008


On Wed, Jul 23, 2008 at 06:56:40PM -0300, Tiago Cruz wrote:
> Hello,
> 
> I have one machine (hotsite-bsb-la-1) exporting GNBD to two machines (hotsite-bsb-la-2 and "-3")
> 
> The cluster with RHEL 5.2 x86_64 and GFS was working very well, util I reboot the hotsite-bsb-la-2:
> 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] CLM CONFIGURATION CHANGE 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] New Configuration: 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.30)  
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.33)  
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Left: 
> Jul 23 18:56:38 hotsite-bsb-la-1 kernel: dlm: closing connection to node 2
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.31)  
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Joined: 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] CLM CONFIGURATION CHANGE 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] New Configuration: 
> Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: hotsite-bsb-la-2.com not a cluster member after 0 sec post_fail_delay
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.30)  
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] 	r(0) ip(10.65.13.33)  
> Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com"
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Left: 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] Members Joined: 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [SYNC ] This node is within the primary component and will provide service. 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [TOTEM] entering OPERATIONAL state. 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] got nodejoin message 10.65.13.30 
> Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com" failed
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM  ] got nodejoin message 10.65.13.33 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG  ] got joinlist message from node 1 
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG  ] got joinlist message from node 3 
> Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com.br"
> Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com.br" failed
> Jul 23 19:00:57 hotsite-bsb-la-1 last message repeated 50 times
> 
> Why fence was failing? Follow the cluster.conf:
> 
> <?xml version="1.0"?>
> <cluster alias="hotsites" config_version="18" name="hotsites">
> 	<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> 	<clusternodes>
> 		<clusternode name="hotsite-bsb-la-1.com" nodeid="1" votes="1">
> 		<fence/>
> 		</clusternode>
> 		<clusternode name="hotsite-bsb-la-2.com" nodeid="2" votes="1">
> 		<fence>
> 	           <method name="single">
> 	                <device name="gnbd" nodename="hotsite-bsb-la-2.com"/>
>         	   </method>
> 		</fence>
> 		</clusternode>
> 		<clusternode name="hotsite-bsb-la-3.com" nodeid="3" votes="1">
> 		<fence>
> 	           <method name="single">
> 	                <device name="gnbd" nodename="hotsite-bsb-la-3.com"/>
>         	   </method>
> 		</fence>
> 		</clusternode>
> 	</clusternodes>
> 	<cman/>
> 	<fencedevices>
> 		<fencedevice agent="fence_gnbd" name="hotsite" servers="hotsite-1.com"/>
> 	</fencedevices>
> 	<rm>
> 		<failoverdomains/>
> 		<resources>
> 			<clusterfs device="/dev/gnbd/hotsite" force_unmount="1" fsid="5666" fstype="gfs" mountpoint="/data" name="data" self_fence="1"/>
> 		</resources>
> 	</rm>
> 	<totem consensus="4800" join="60" token="10000" token_retransmits_before_loss_const="20"/>
> </cluster>
>

There are two problem's with your cluster.conf file that may be causing
this.

1. In the clusternode <device> line for fencing devices, "name" must be
the same as "name" in the appropriate <fencedevice> line.

2. In the <fencedevice> line, the "servers" must be listed using the "name"
in <clusternode> line.

So, for your configuration, the <fencedevice> line should be

<fencedevice agent="fence_gnbd" name="gnbd" servers="hotsite-bsb-la-1.com"/>

See if this helps.

-Ben
 
> 
> 
> # cman_tool status
> Version: 6.1.0
> Config Version: 18
> Cluster Name: hotsites
> Cluster Id: 27589
> Cluster Member: Yes
> Cluster Generation: 184
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 3
> Total votes: 2
> Quorum: 2  
> Active subsystems: 8
> Flags: Dirty 
> Ports Bound: 0 177  
> Node name: hotsite-bsb-la-1.com
> Node ID: 1
> Multicast addresses: 239.192.107.49 
> Node addresses: 10.65.13.30 
> 
> 
> Thanks
> 
> -- 
> Tiago Cruz
> http://everlinux.com
> Linux User #282636
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list