[Linux-cluster] cluster fenced error

Ben .T.George bentech4you at gmail.com
Tue Sep 18 03:37:23 UTC 2012


Hi thanks for your reply

This is Cisco UCS machine. yesterday cisco guys created a separate vswitch
for this heartbeat.

regards,
Ben

On Tue, Sep 18, 2012 at 6:25 AM, Digimer <lists at alteeve.ca> wrote:

> You have two problems;
>
> 1. The nodes can't talk to each other (via multicast) *or* you are taking
> too long to start each node. Given that you are using luci, I am guessing
> the former. Log into your switch and see if the multicast group shown in
> 'cman_tool status' exists.
>
> 2. Your fencing isn't working. Read the man page for fence_cisco_ucs to
> try and debug it.
>
> digimer
>
> PS - Please don't reply directly to me. Keep the conversation public.
> PPS - Filter out your passwords. ;)
>
>
> On 09/17/2012 11:17 PM, Ben .T.George wrote:
>
>> Hi thanks for your reply
>>
>> Beloe is my cluster.conffile
>>
>> <?xml version="1.0"?>
>> <cluster config_version="7" name="eccprd">
>>          <clusternodes>
>>                  <clusternode name="cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>"
>> nodeid="1">
>>
>>                          <fence>
>>                                  <method name="ucs-node1"/>
>>                          </fence>
>>                  </clusternode>
>>                  <clusternode name="cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>
>> <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>"
>> nodeid="2">
>>
>>                          <fence>
>>                                  <method name="ucs-node2"/>
>>                          </fence>
>>                  </clusternode>
>>          </clusternodes>
>>          <cman expected_votes="1" two_node="1"/>
>>          <rm>
>>                  <resources>
>>                          <ip address="172.22.10.230" sleeptime="10"/>
>>                  </resources>
>>                  <service exclusive="1" name="eccsapmnt"
>> recovery="relocate">
>>                          <ip ref="172.22.10.230"/>
>>                  </service>
>>          </rm>
>>          <fencedevices>
>>                  <fencedevice agent="fence_cisco_ucs"
>> ipaddr="172.22.90.61" login="admin" name="ucs-node1" passwd="..."/>
>>                  <fencedevice agent="fence_cisco_ucs"
>> ipaddr="172.22.90.59" login="admin" name="ucs-node2" passwd="..."/>
>>
>>          </fencedevices>
>> </cluster>
>>
>> when i try to start cluster on node1, i am geeting this message on
>> mesages:
>>
>>   tail -f -n 0 /var/log/messages
>> Sep 18 06:06:02 cgceccprd1 modcluster: Starting service: eccsapmnt on node
>> Sep 18 06:06:08 cgceccprd1 modcluster: Starting service: eccsapmnt on
>> node cgceccprd1.combinedgroup.net <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>
>>
>>
>> but the service is not starting.on luci , it's showing both nodes are
>> online.but on clustat different
>>
>> main error getting on messages is
>>
>> Sep 18 03:35:48 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>> Sep 18 04:06:16 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>> Sep 18 04:36:45 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>> Sep 18 05:07:14 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>> Sep 18 05:37:42 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>>
>> These messages from node1.i am geeting same message on node saying that
>>
>> cgceccprd2 fenced[8424]: fencing node cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>
>> still retrying
>>
>>
>> i don't know what is problem here.
>>
>> please help me solve
>> Regards,
>> Ben
>>
>> On Tue, Sep 18, 2012 at 4:42 AM, Digimer <lists at alteeve.ca
>> <mailto:lists at alteeve.ca>> wrote:
>>
>>     On 09/17/2012 06:07 PM, Ben .T.George wrote:
>>
>>         Hi
>>
>>         My cluster is failing to start.
>>
>>         if i check clustat on node1, status is showing node1 online and
>>         node2
>>         offline. If the check clustat on node2, node2 is showing online
>> and
>>         node1 is offline
>>
>>         i checked logs.fanced is throwing errors.how can i rectify this
>>
>>         Sep 17 23:24:54 fenced fencing node cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> still retrying
>>
>>         Sep 17 23:55:06 fenced fencing node cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> still retrying
>>
>>         Sep 18 00:25:19 fenced fencing node cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> still retrying
>>
>>         Sep 18 00:55:03 fenced fenced 3.0.12.1 started
>>         Sep 18 00:55:03 fenced failed to get dbus connection
>>         Sep 18 00:55:55 fenced fencing node cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >>
>>
>>         Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> dev 0.0 agent none
>>         result: error
>>
>>         no method
>>         Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> failed
>>
>>         Sep 18 00:55:58 fenced fencing node cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >>
>>
>>         Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> dev 0.0 agent none
>>         result: error
>>
>>         no method
>>         Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> failed
>>
>>         Sep 18 00:56:01 fenced fencing node cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >>
>>
>>         Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> dev 0.0 agent none
>>         result: error
>>
>>         no method
>>         Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>         <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>>         <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> failed
>>
>>
>>
>>         please help me solve this issue
>>
>>         Regards,
>>         Ben
>>
>>
>>     What is your cluster.conf?
>>
>>     likely you either have no fencing configured, or your fencing is not
>>     working. Either way, failing to fence is a critical problem and the
>>     cluster will hang, just as you're seeing here. This is by design.
>>     Better to hang a cluster than to corrupt it.
>>
>>     digimer
>>
>>     --
>>     Digimer
>>     Papers and Projects: https://alteeve.ca
>>
>>
>>
>>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca
>



-- 
Yours Sincerely

*#!/usr/bin/env python
#Mysignature.py :)*

Signature = " " " Ben.T.George \n
                  Linux System Administrator \n
                  Diyar United Company \n
                  kuwait \n
                  Phone : +965 - 50629829 \n " " "

Print Signature
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120918/24667f87/attachment.htm>


More information about the Linux-cluster mailing list