[Linux-cluster] fence_gnbd failed
Benjamin Marzinski
bmarzins at redhat.com
Thu Jul 31 17:35:42 UTC 2008
On Wed, Jul 23, 2008 at 06:56:40PM -0300, Tiago Cruz wrote:
> Hello,
>
> I have one machine (hotsite-bsb-la-1) exporting GNBD to two machines (hotsite-bsb-la-2 and "-3")
>
> The cluster with RHEL 5.2 x86_64 and GFS was working very well, util I reboot the hotsite-bsb-la-2:
>
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] New Configuration:
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.30)
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.33)
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Left:
> Jul 23 18:56:38 hotsite-bsb-la-1 kernel: dlm: closing connection to node 2
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.31)
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Joined:
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] CLM CONFIGURATION CHANGE
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] New Configuration:
> Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: hotsite-bsb-la-2.com not a cluster member after 0 sec post_fail_delay
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.30)
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] r(0) ip(10.65.13.33)
> Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com"
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Left:
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] Members Joined:
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [SYNC ] This node is within the primary component and will provide service.
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [TOTEM] entering OPERATIONAL state.
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] got nodejoin message 10.65.13.30
> Jul 23 18:56:38 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com" failed
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CLM ] got nodejoin message 10.65.13.33
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG ] got joinlist message from node 1
> Jul 23 18:56:38 hotsite-bsb-la-1 openais[3082]: [CPG ] got joinlist message from node 3
> Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fencing node "hotsite-bsb-la-2.com.br"
> Jul 23 18:56:43 hotsite-bsb-la-1 fenced[3099]: fence "hotsite-bsb-la-2.com.br" failed
> Jul 23 19:00:57 hotsite-bsb-la-1 last message repeated 50 times
>
> Why fence was failing? Follow the cluster.conf:
>
> <?xml version="1.0"?>
> <cluster alias="hotsites" config_version="18" name="hotsites">
> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> <clusternodes>
> <clusternode name="hotsite-bsb-la-1.com" nodeid="1" votes="1">
> <fence/>
> </clusternode>
> <clusternode name="hotsite-bsb-la-2.com" nodeid="2" votes="1">
> <fence>
> <method name="single">
> <device name="gnbd" nodename="hotsite-bsb-la-2.com"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="hotsite-bsb-la-3.com" nodeid="3" votes="1">
> <fence>
> <method name="single">
> <device name="gnbd" nodename="hotsite-bsb-la-3.com"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman/>
> <fencedevices>
> <fencedevice agent="fence_gnbd" name="hotsite" servers="hotsite-1.com"/>
> </fencedevices>
> <rm>
> <failoverdomains/>
> <resources>
> <clusterfs device="/dev/gnbd/hotsite" force_unmount="1" fsid="5666" fstype="gfs" mountpoint="/data" name="data" self_fence="1"/>
> </resources>
> </rm>
> <totem consensus="4800" join="60" token="10000" token_retransmits_before_loss_const="20"/>
> </cluster>
>
There are two problem's with your cluster.conf file that may be causing
this.
1. In the clusternode <device> line for fencing devices, "name" must be
the same as "name" in the appropriate <fencedevice> line.
2. In the <fencedevice> line, the "servers" must be listed using the "name"
in <clusternode> line.
So, for your configuration, the <fencedevice> line should be
<fencedevice agent="fence_gnbd" name="gnbd" servers="hotsite-bsb-la-1.com"/>
See if this helps.
-Ben
>
>
> # cman_tool status
> Version: 6.1.0
> Config Version: 18
> Cluster Name: hotsites
> Cluster Id: 27589
> Cluster Member: Yes
> Cluster Generation: 184
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 3
> Total votes: 2
> Quorum: 2
> Active subsystems: 8
> Flags: Dirty
> Ports Bound: 0 177
> Node name: hotsite-bsb-la-1.com
> Node ID: 1
> Multicast addresses: 239.192.107.49
> Node addresses: 10.65.13.30
>
>
> Thanks
>
> --
> Tiago Cruz
> http://everlinux.com
> Linux User #282636
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list