[Linux-cluster] cluster fenced error
Digimer
lists at alteeve.ca
Tue Sep 18 03:25:57 UTC 2012
You have two problems;
1. The nodes can't talk to each other (via multicast) *or* you are
taking too long to start each node. Given that you are using luci, I am
guessing the former. Log into your switch and see if the multicast group
shown in 'cman_tool status' exists.
2. Your fencing isn't working. Read the man page for fence_cisco_ucs to
try and debug it.
digimer
PS - Please don't reply directly to me. Keep the conversation public.
PPS - Filter out your passwords. ;)
On 09/17/2012 11:17 PM, Ben .T.George wrote:
> Hi thanks for your reply
>
> Beloe is my cluster.conffile
>
> <?xml version="1.0"?>
> <cluster config_version="7" name="eccprd">
> <clusternodes>
> <clusternode name="cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>" nodeid="1">
> <fence>
> <method name="ucs-node1"/>
> </fence>
> </clusternode>
> <clusternode name="cgceccprd2.combinedgroup.net
> <http://cgceccprd2.combinedgroup.net>" nodeid="2">
> <fence>
> <method name="ucs-node2"/>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <rm>
> <resources>
> <ip address="172.22.10.230" sleeptime="10"/>
> </resources>
> <service exclusive="1" name="eccsapmnt"
> recovery="relocate">
> <ip ref="172.22.10.230"/>
> </service>
> </rm>
> <fencedevices>
> <fencedevice agent="fence_cisco_ucs"
> ipaddr="172.22.90.61" login="admin" name="ucs-node1" passwd="..."/>
> <fencedevice agent="fence_cisco_ucs"
> ipaddr="172.22.90.59" login="admin" name="ucs-node2" passwd="..."/>
> </fencedevices>
> </cluster>
>
> when i try to start cluster on node1, i am geeting this message on mesages:
>
> tail -f -n 0 /var/log/messages
> Sep 18 06:06:02 cgceccprd1 modcluster: Starting service: eccsapmnt on node
> Sep 18 06:06:08 cgceccprd1 modcluster: Starting service: eccsapmnt on
> node cgceccprd1.combinedgroup.net <http://cgceccprd1.combinedgroup.net>
>
>
> but the service is not starting.on luci , it's showing both nodes are
> online.but on clustat different
>
> main error getting on messages is
>
> Sep 18 03:35:48 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
> Sep 18 04:06:16 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
> Sep 18 04:36:45 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
> Sep 18 05:07:14 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
> Sep 18 05:37:42 cgceccprd1 fenced[8424]: fencing node
> cgceccprd2.combinedgroup.net <http://cgceccprd2.combinedgroup.net> still
> retrying
>
> These messages from node1.i am geeting same message on node saying that
>
> cgceccprd2 fenced[8424]: fencing node cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net> still retrying
>
> i don't know what is problem here.
>
> please help me solve
> Regards,
> Ben
>
> On Tue, Sep 18, 2012 at 4:42 AM, Digimer <lists at alteeve.ca
> <mailto:lists at alteeve.ca>> wrote:
>
> On 09/17/2012 06:07 PM, Ben .T.George wrote:
>
> Hi
>
> My cluster is failing to start.
>
> if i check clustat on node1, status is showing node1 online and
> node2
> offline. If the check clustat on node2, node2 is showing online and
> node1 is offline
>
> i checked logs.fanced is throwing errors.how can i rectify this
>
> Sep 17 23:24:54 fenced fencing node cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>> still retrying
>
> Sep 17 23:55:06 fenced fencing node cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>> still retrying
>
> Sep 18 00:25:19 fenced fencing node cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>> still retrying
>
> Sep 18 00:55:03 fenced fenced 3.0.12.1 started
> Sep 18 00:55:03 fenced failed to get dbus connection
> Sep 18 00:55:55 fenced fencing node cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>>
>
> Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
> result: error
>
> no method
> Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>> failed
>
> Sep 18 00:55:58 fenced fencing node cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>>
>
> Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
> result: error
>
> no method
> Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>> failed
>
> Sep 18 00:56:01 fenced fencing node cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>>
>
> Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>> dev 0.0 agent none
> result: error
>
> no method
> Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>
> <http://cgceccprd1.__combinedgroup.net
> <http://cgceccprd1.combinedgroup.net>> failed
>
>
>
> please help me solve this issue
>
> Regards,
> Ben
>
>
> What is your cluster.conf?
>
> likely you either have no fencing configured, or your fencing is not
> working. Either way, failing to fence is a critical problem and the
> cluster will hang, just as you're seeing here. This is by design.
> Better to hang a cluster than to corrupt it.
>
> digimer
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca
>
>
>
--
Digimer
Papers and Projects: https://alteeve.ca
More information about the Linux-cluster
mailing list