[Linux-cluster] fence_apc_snmp woes

Brian Sheets bsheets at singlefin.net
Fri Aug 24 11:06:15 UTC 2007


I have a 2 node cluster on debian. Below is my cluster.conf. If I down node1's nic 
node2 sees and tries to fence


Aug 24 10:57:38 oc-index4 fenced[7599]: oc-index3 not a cluster member after 0 sec post_fail_delay
Aug 24 10:57:38 oc-index4 fenced[7599]: fencing node "oc-index3"
Aug 24 10:57:38 oc-index4 fence_manual: Node 172.16.14.100 needs to be reset before recovery can procede.  Waiting for 172.16.14.100 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n 172.16.14.100)
Aug 24 10:59:34 oc-index4 fenced[7599]: fence "oc-index3" success

It states that it's fencing, but never does, and if I do a fence_ack_manual, then fence_apc_snmp gets run and the node1 gets powered down.

what am I missing?

<?xml version="1.0"?>
<cluster name="index" config_version="2">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="oc-index3" votes="1">
        <fence>
                <method name="single">
                       <device name="oc-cab1-pdu2" port="18" option="off"/>
                </method>
        </fence>
</clusternode>

<clusternode name="oc-index4" votes="1">
        <fence>
                <method name="single">
                  <device name="oc-cab1-pdu1" port="16" option="off"/>
                  </method>
        </fence>
</clusternode>

</clusternodes>
<fencedevices>
        <fencedevice name="oc-cab1-pdu2" agent="fence_apc_snmp" ipaddr="172.16.14.9" login="apc" passwd="xxxx"/>
        <fencedevice name="oc-cab1-pdu1" agent="fence_apc_snmp" ipaddr="172.16.14.8" login="apc" passwd="xxxx"/>
</fencedevices>
</cluster>




More information about the Linux-cluster mailing list