[Linux-cluster] fence_sanbox2 configuration?
Dan B. Phung
phung at cs.columbia.edu
Tue Jun 28 17:33:38 UTC 2005
I've been trying to configure my gfs cluster to use the fence_sanbox2, and
don't quite have it right. my configuration is pasted below. To test it,
I reboot one of the machines in the cluster, but instead of getting fenced
and the journal replayed, the rest of the cluster just hangs (errors
below). In addition, when I telnet to the fibre switch, the port is not
disabled as I thought it would be. I'm able to use fence_sanbox2 directly
to disable and enable the port. So...does anybody see what I'm doing
wrong? How can I debug this further? Does anything special need to be
done to the init scripts to disable/reenable the port, or does the fenced
take care of that?
I'm using the RHEL4 branch, which I cvs updated a couple weeks back.
<CONFIG FILE>
<?xml version="1.0"?>
<cluster name="blade_cluster" config_version="1">
<fencedevices>
<fencedevice name="human" agent="fence_manual"/>
<fencedevice name="san" agent="fence_sanbox2"
ipaddr="128.50.18.66" login="gfs"
passwd="foobar"/>
</fencedevices>
<fence_daemon clean_start="0">
</fence_daemon>
<cman>
<multicast addr="224.0.0.18"/>
</cman>
<clusternodes>
<clusternode name="blade01" nodeid="1" votes="1">
<multicast addr="224.0.0.18" interface="eth0"/>
<fence>
<method name="fibre">
<device name="san" port="1"/>
</method>
<method name="single">
<device name="human" ipaddr="128.50.18.1"/>
</method>
</fence>
</clusternode>
<clusternode name="blade02" nodeid="2" votes="0">
<multicast addr="224.0.0.18" interface="eth0"/>
<fence>
<method name="fibre">
<device name="san" port="2"/>
</method>
<method name="single">
<device name="human" ipaddr="128.50.18.2"/>
</method>
</fence>
</clusternode>
<clusternode name="blade03" nodeid="3" votes="0">
<multicast addr="224.0.0.18" interface="eth0"/>
<fence>
<method name="fibre">
<device name="san" port="3"/>
</method>
<method name="single">
<device name="human" ipaddr="128.50.18.3"/>
</method>
</fence>
</clusternode>
</cluster>
<ERRORS>
CMAN: node blade04 has been removed from the cluster : Shutdown
CMAN: node blade04 rejoining
CMAN: removing node blade04 from the cluster : Missed too many heartbeats
CMAN: node blade04 rejoining
CMAN: node blade01 has been removed from the cluster : No response to
messages
SM: 00000001 process_recovery_barrier status=-104
-dan
More information about the Linux-cluster
mailing list