[Linux-cluster] cluster fenced error

Digimer lists at alteeve.ca
Tue Sep 18 03:25:57 UTC 2012


You have two problems;

1. The nodes can't talk to each other (via multicast) *or* you are 
taking too long to start each node. Given that you are using luci, I am 
guessing the former. Log into your switch and see if the multicast group 
shown in 'cman_tool status' exists.

2. Your fencing isn't working. Read the man page for fence_cisco_ucs to 
try and debug it.

digimer

PS - Please don't reply directly to me. Keep the conversation public.
PPS - Filter out your passwords. ;)

On 09/17/2012 11:17 PM, Ben .T.George wrote:
> Hi thanks for your reply
>
> Beloe is my cluster.conffile
>
> <?xml version="1.0"?>
> <cluster config_version="7" name="eccprd">
>          <clusternodes>
>                  <clusternode name="cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>" nodeid="1">
>                          <fence>
>                                  <method name="ucs-node1"/>
>                          </fence>
>                  </clusternode>
>                  <clusternode name="cgceccprd2.combinedgroup.net
> <http://cgceccprd2.combinedgroup.net>" nodeid="2">
>                          <fence>
>                                  <method name="ucs-node2"/>
>                          </fence>
>                  </clusternode>
>          </clusternodes>
>          <cman expected_votes="1" two_node="1"/>
>          <rm>
>                  <resources>
>                          <ip address="172.22.10.230" sleeptime="10"/>
>                  </resources>
>                  <service exclusive="1" name="eccsapmnt"
> recovery="relocate">
>                          <ip ref="172.22.10.230"/>
>                  </service>
>          </rm>
>          <fencedevices>
>                  <fencedevice agent="fence_cisco_ucs"
> ipaddr="172.22.90.61" login="admin" name="ucs-node1" passwd="..."/>
>                  <fencedevice agent="fence_cisco_ucs"
> ipaddr="172.22.90.59" login="admin" name="ucs-node2" passwd="..."/>
>          </fencedevices>
> </cluster>
>
> when i try to start cluster on node1, i am geeting this message on mesages:
>
>   tail -f -n 0 /var/log/messages
> Sep 18 06:06:02 cgceccprd1 modcluster: Starting service: eccsapmnt on node
> Sep 18 06:06:08 cgceccprd1 modcluster: Starting service: eccsapmnt on
> node cgceccprd1.combinedgroup.net <http://cgceccprd1.combinedgroup.net>
>
>
> but the service is not starting.on luci , it's showing both nodes are
> online.but on clustat different
>
> main error getting on messages is
>
> Sep 18 03:35:48 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
> Sep 18 04:06:16 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
> Sep 18 04:36:45 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
> Sep 18 05:07:14 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
> Sep 18 05:37:42 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
>
> These messages from node1.i am geeting same message on node saying that
>
> cgceccprd2 fenced[8424]: fencing node cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net> still retrying
>
> i don't know what is problem here.
>
> please help me solve
> Regards,
> Ben
>
> On Tue, Sep 18, 2012 at 4:42 AM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
>
>     On 09/17/2012 06:07 PM, Ben .T.George wrote:
>
>         Hi
>
>         My cluster is failing to start.
>
>         if i check clustat on node1, status is showing node1 online and
>         node2
>         offline. If the check clustat on node2, node2 is showing online and
>         node1 is offline
>
>         i checked logs.fanced is throwing errors.how can i rectify this
>
>         Sep 17 23:24:54 fenced fencing node cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>> still retrying
>
>         Sep 17 23:55:06 fenced fencing node cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>> still retrying
>
>         Sep 18 00:25:19 fenced fencing node cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>> still retrying
>
>         Sep 18 00:55:03 fenced fenced 3.0.12.1 started
>         Sep 18 00:55:03 fenced failed to get dbus connection
>         Sep 18 00:55:55 fenced fencing node cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>>
>
>         Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
>         result: error
>
>         no method
>         Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>> failed
>
>         Sep 18 00:55:58 fenced fencing node cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>>
>
>         Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
>         result: error
>
>         no method
>         Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>> failed
>
>         Sep 18 00:56:01 fenced fencing node cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>>
>
>         Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
>         result: error
>
>         no method
>         Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>
>         <http://cgceccprd1.__combinedgroup.net
>         <http://cgceccprd1.combinedgroup.net>> failed
>
>
>
>         please help me solve this issue
>
>         Regards,
>         Ben
>
>
>     What is your cluster.conf?
>
>     likely you either have no fencing configured, or your fencing is not
>     working. Either way, failing to fence is a critical problem and the
>     cluster will hang, just as you're seeing here. This is by design.
>     Better to hang a cluster than to corrupt it.
>
>     digimer
>
>     --
>     Digimer
>     Papers and Projects: https://alteeve.ca
>
>
>


-- 
Digimer
Papers and Projects: https://alteeve.ca




More information about the Linux-cluster mailing list