[Linux-cluster] fencing failing

Brad Filipek bfilipek at crscold.com
Mon Jul 23 18:33:28 UTC 2007


I have an APC MasterSwitch as my fencing device. I configured my cluster
to use "APC" as the fencing device, and have confirmed that it has the
correct un, pw, and IP address configured. However, when it tries to
reboot a failed node, I get this in /var/log/messages:

Jul 20 15:51:28 server1 fenced[32169]: agent "fence_apc" reports:
failed: unrecognised menu response

Jul 20 15:51:28 server1 fenced[32169]: fence "server2.my.domain.com"
failed

However, when I run this command from a terminal, it runs fine and the
failed node reboots:

fence_apc -a 192.168.1.61 -l ***** -p ***** -n 6 -v


This is in rhel5

APC Firmware:
=======================================================================
Network Management Card AOS      v2.6.4
MSP APP                          v2.6.2
=======================================================================

cluster.conf file:
=======================================================================
<?xml version="1.0"?>
<cluster alias="cluster1" config_version="20" name="cluster1">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="server1.my.domain.com" nodeid="1"
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="APCMS62" port="7"
switch="0"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="server2.my.domain.com" nodeid="2"
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="APCMS62" port="6"
switch="0"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.1.61"
login="name" name="APCMS61" passwd="pass"/>
                <fencedevice agent="fence_apc" ipaddr="192.168.1.62"
login="name" name="APCMS62" passwd="pass"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="main" ordered="0"
restricted="0">
                                <failoverdomainnode
name="server1.my.domain.com" priority="1"/>
                                <failoverdomainnode
name="server2.my.domain.com" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <smb name="smb1" workgroup="WKGRP"/>
                        <ip address="192.168.1.20" monitor_link="1"/>
                </resources>
                <service autostart="1" domain="main" name="samba">
                        <smb ref="smb"/>
                        <ip ref="192.168.1.20"/>
                </service>
        </rm>
</cluster>
=======================================================================


Confidentiality Notice: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. 

If you have received this communication in error, please notify us immediately by email reply or by telephone and immediately delete this message and any attachments.





More information about the Linux-cluster mailing list