[Linux-cluster] fence_sanbox2 configuration?

Tue Jun 28 17:33:38 UTC 2005

I've been trying to configure my gfs cluster to use the fence_sanbox2, and
don't quite have it right.  my configuration is pasted below.  To test it,
I reboot one of the machines in the cluster, but instead of getting fenced
and the journal replayed, the rest of the cluster just hangs (errors
below).  In addition, when I telnet to the fibre switch, the port is not
disabled as I thought it would be.  I'm able to use fence_sanbox2 directly
to disable and enable the port.  So...does anybody see what I'm doing
wrong?  How can I debug this further?  Does anything special need to be
done to the init scripts to disable/reenable the port, or does the fenced
take care of that?

I'm using the RHEL4 branch, which I cvs updated a couple weeks back.

<CONFIG FILE>

<?xml version="1.0"?>
<cluster name="blade_cluster" config_version="1">

        <fencedevices>
          <fencedevice name="human" agent="fence_manual"/>

          <fencedevice name="san" agent="fence_sanbox2"
           ipaddr="128.50.18.66" login="gfs"
           passwd="foobar"/>

        </fencedevices>

        <fence_daemon clean_start="0">
        </fence_daemon>

        <cman>
          <multicast addr="224.0.0.18"/>
        </cman>

        <clusternodes>
          <clusternode name="blade01" nodeid="1" votes="1">
          <multicast addr="224.0.0.18" interface="eth0"/>
             <fence>
               <method name="fibre">
                 <device name="san" port="1"/>
               </method>
               <method name="single">
                 <device name="human" ipaddr="128.50.18.1"/>
               </method>
             </fence>
          </clusternode>

          <clusternode name="blade02" nodeid="2" votes="0">
          <multicast addr="224.0.0.18" interface="eth0"/>
             <fence>
               <method name="fibre">
                 <device name="san" port="2"/>
               </method>
               <method name="single">
                 <device name="human" ipaddr="128.50.18.2"/>
               </method>
             </fence>
          </clusternode>

          <clusternode name="blade03" nodeid="3" votes="0">
          <multicast addr="224.0.0.18" interface="eth0"/>
             <fence>
               <method name="fibre">
                 <device name="san" port="3"/>
               </method>
               <method name="single">
                 <device name="human" ipaddr="128.50.18.3"/>
               </method>
             </fence>
          </clusternode>
</cluster>

<ERRORS>

CMAN: node blade04 has been removed from the cluster : Shutdown
CMAN: node blade04 rejoining
CMAN: removing node blade04 from the cluster : Missed too many heartbeats
CMAN: node blade04 rejoining
CMAN: node blade01 has been removed from the cluster : No response to 
messages
SM: 00000001 process_recovery_barrier status=-104

-dan