[Linux-cluster] cluster fenced error
Ben .T.George
bentech4you at gmail.com
Tue Sep 18 03:37:23 UTC 2012
Hi thanks for your reply
This is Cisco UCS machine. yesterday cisco guys created a separate vswitch
for this heartbeat.
regards,
Ben
On Tue, Sep 18, 2012 at 6:25 AM, Digimer <lists at alteeve.ca> wrote:
> You have two problems;
>
> 1. The nodes can't talk to each other (via multicast) *or* you are taking
> too long to start each node. Given that you are using luci, I am guessing
> the former. Log into your switch and see if the multicast group shown in
> 'cman_tool status' exists.
>
> 2. Your fencing isn't working. Read the man page for fence_cisco_ucs to
> try and debug it.
>
> digimer
>
> PS - Please don't reply directly to me. Keep the conversation public.
> PPS - Filter out your passwords. ;)
>
>
> On 09/17/2012 11:17 PM, Ben .T.George wrote:
>
>> Hi thanks for your reply
>>
>> Beloe is my cluster.conffile
>>
>> <?xml version="1.0"?>
>> <cluster config_version="7" name="eccprd">
>> <clusternodes>
>> <clusternode name="cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>"
>> nodeid="1">
>>
>> <fence>
>> <method name="ucs-node1"/>
>> </fence>
>> </clusternode>
>> <clusternode name="cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>
>> <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>"
>> nodeid="2">
>>
>> <fence>
>> <method name="ucs-node2"/>
>> </fence>
>> </clusternode>
>> </clusternodes>
>> <cman expected_votes="1" two_node="1"/>
>> <rm>
>> <resources>
>> <ip address="172.22.10.230" sleeptime="10"/>
>> </resources>
>> <service exclusive="1" name="eccsapmnt"
>> recovery="relocate">
>> <ip ref="172.22.10.230"/>
>> </service>
>> </rm>
>> <fencedevices>
>> <fencedevice agent="fence_cisco_ucs"
>> ipaddr="172.22.90.61" login="admin" name="ucs-node1" passwd="..."/>
>> <fencedevice agent="fence_cisco_ucs"
>> ipaddr="172.22.90.59" login="admin" name="ucs-node2" passwd="..."/>
>>
>> </fencedevices>
>> </cluster>
>>
>> when i try to start cluster on node1, i am geeting this message on
>> mesages:
>>
>> tail -f -n 0 /var/log/messages
>> Sep 18 06:06:02 cgceccprd1 modcluster: Starting service: eccsapmnt on node
>> Sep 18 06:06:08 cgceccprd1 modcluster: Starting service: eccsapmnt on
>> node cgceccprd1.combinedgroup.net <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>>
>>
>>
>> but the service is not starting.on luci , it's showing both nodes are
>> online.but on clustat different
>>
>> main error getting on messages is
>>
>> Sep 18 03:35:48 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>> Sep 18 04:06:16 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>> Sep 18 04:36:45 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>> Sep 18 05:07:14 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>> Sep 18 05:37:42 cgceccprd1 fenced[8424]: fencing node
>> cgceccprd2.combinedgroup.net <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>
>> still
>>
>> retrying
>>
>> These messages from node1.i am geeting same message on node saying that
>>
>> cgceccprd2 fenced[8424]: fencing node cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>
>> still retrying
>>
>>
>> i don't know what is problem here.
>>
>> please help me solve
>> Regards,
>> Ben
>>
>> On Tue, Sep 18, 2012 at 4:42 AM, Digimer <lists at alteeve.ca
>> <mailto:lists at alteeve.ca>> wrote:
>>
>> On 09/17/2012 06:07 PM, Ben .T.George wrote:
>>
>> Hi
>>
>> My cluster is failing to start.
>>
>> if i check clustat on node1, status is showing node1 online and
>> node2
>> offline. If the check clustat on node2, node2 is showing online
>> and
>> node1 is offline
>>
>> i checked logs.fanced is throwing errors.how can i rectify this
>>
>> Sep 17 23:24:54 fenced fencing node cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> still retrying
>>
>> Sep 17 23:55:06 fenced fencing node cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> still retrying
>>
>> Sep 18 00:25:19 fenced fencing node cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> still retrying
>>
>> Sep 18 00:55:03 fenced fenced 3.0.12.1 started
>> Sep 18 00:55:03 fenced failed to get dbus connection
>> Sep 18 00:55:55 fenced fencing node cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >>
>>
>> Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> dev 0.0 agent none
>> result: error
>>
>> no method
>> Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> failed
>>
>> Sep 18 00:55:58 fenced fencing node cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >>
>>
>> Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> dev 0.0 agent none
>> result: error
>>
>> no method
>> Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> failed
>>
>> Sep 18 00:56:01 fenced fencing node cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >>
>>
>> Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> dev 0.0 agent none
>> result: error
>>
>> no method
>> Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>
>> >
>> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net>
>>
>> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>>
>> failed
>>
>>
>>
>> please help me solve this issue
>>
>> Regards,
>> Ben
>>
>>
>> What is your cluster.conf?
>>
>> likely you either have no fencing configured, or your fencing is not
>> working. Either way, failing to fence is a critical problem and the
>> cluster will hang, just as you're seeing here. This is by design.
>> Better to hang a cluster than to corrupt it.
>>
>> digimer
>>
>> --
>> Digimer
>> Papers and Projects: https://alteeve.ca
>>
>>
>>
>>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca
>
--
Yours Sincerely
*#!/usr/bin/env python
#Mysignature.py :)*
Signature = " " " Ben.T.George \n
Linux System Administrator \n
Diyar United Company \n
kuwait \n
Phone : +965 - 50629829 \n " " "
Print Signature
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120918/24667f87/attachment.htm>
More information about the Linux-cluster
mailing list