[Linux-cluster] fencing failing

Aravind Parchuri aravind.parchuri at gmail.com
Mon Jul 23 19:31:44 UTC 2007


bfilipek at crscold.com wrote:
> I have an APC MasterSwitch as my fencing device. I configured my cluster
> to use "APC" as the fencing device, and have confirmed that it has the
> correct un, pw, and IP address configured. However, when it tries to
> reboot a failed node, I get this in /var/log/messages:
> 
> Jul 20 15:51:28 server1 fenced[32169]: agent "fence_apc" reports:
> failed: unrecognised menu response
> 
We faced the same problem in FC6, with an APC 7900 switch.
> Jul 20 15:51:28 server1 fenced[32169]: fence "server2.my.domain.com"
> failed
> 
> However, when I run this command from a terminal, it runs fine and the
> failed node reboots:
> 
> fence_apc -a 192.168.1.61 -l ***** -p ***** -n 6 -v

In our case, even running it from the command line didn't work. The rpms 
in the repo have the old perl script - probably the case with the RHEL5 
rpms too. The python script in CVS seems to work fine though:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/fence/agents/apc/fence_apc.py?rev=1.5&content-type=text/x-cvsweb-markup&cvsroot=cluster

Try replacing /sbin/fence_apc with the python script and see if it helps.

> 
> 
> This is in rhel5
> 
> APC Firmware:
> =======================================================================
> Network Management Card AOS      v2.6.4
> MSP APP                          v2.6.2
> =======================================================================
> 
> cluster.conf file:
> =======================================================================
> <?xml version="1.0"?>
> <cluster alias="cluster1" config_version="20" name="cluster1">
>         <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="server1.my.domain.com" nodeid="1"
> votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="APCMS62" port="7"
> switch="0"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="server2.my.domain.com" nodeid="2"
> votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="APCMS62" port="6"
> switch="0"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_apc" ipaddr="192.168.1.61"
> login="name" name="APCMS61" passwd="pass"/>
>                 <fencedevice agent="fence_apc" ipaddr="192.168.1.62"
> login="name" name="APCMS62" passwd="pass"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="main" ordered="0"
> restricted="0">
>                                 <failoverdomainnode
> name="server1.my.domain.com" priority="1"/>
>                                 <failoverdomainnode
> name="server2.my.domain.com" priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <smb name="smb1" workgroup="WKGRP"/>
>                         <ip address="192.168.1.20" monitor_link="1"/>
>                 </resources>
>                 <service autostart="1" domain="main" name="samba">
>                         <smb ref="smb"/>
>                         <ip ref="192.168.1.20"/>
>                 </service>
>         </rm>
> </cluster>
> =======================================================================
> 
> 
> Confidentiality Notice: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. 
> 
> If you have received this communication in error, please notify us immediately by email reply or by telephone and immediately delete this message and any attachments.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list