[Linux-cluster] RHSCv4 2-node cluster hangs while starting fenced

David Teigland teigland at redhat.com
Tue Nov 1 15:37:57 UTC 2005


On Tue, Nov 01, 2005 at 01:46:07PM +0600, Hirantha Wijayawardena wrote:
> Dear All,
> 
> We are setting up a 2 node cluster (node1 and node2) with RHCSv4 with RHELv4
> for one of my clients. My hardware is 2 HPDL380 with iLO as a fence device
> for each node. MSA500 is shared storage for both nodes.
> 
> Cluster rpms are installed successfully with latest kernel and RHCS updates
> from the RHN. Initially all the services (ccsd,cman,fenced etc.) are
> starting smoothly. The issue is when we unplugged the network cable of node1
> and node2 will fencing the node1 and shutdown the machine; then node1 will
> automatically get shutdown itself. Now both nodes are down. So we start one
> node (say node1) and it hangs on the fencing domain state - when we start
> the other node (say node2), node2 will shutdown node1 then again node2
> shutdown itself. It is very difficult to get the clear picture of these
> states, since I couldn't get an idea or how to configure iLO fence device on
> both nodes.
> 
> Please advice how to configure HP iLOs on both nodes and how to rectify this
> issue.

In this special two node configuration, if both nodes are still alive they
will each try to fence the other by design.  We expect that A will fence B
before B can fence A -- that's always the case if you have a single
fencing device, but with iLO I believe it's possible for both nodes to
fence each other in parallel.  That would result in both being rebooted
instead of just one as we intend.  In practice, I'd expect that one node
may often be faster than the other by a slight margin resulting in just
one node being rebooted.

Another way to get around this problem is by using the fenced -f option to
specify different post-fail-delay values for the two nodes.  On node1 do
'fenced -f 1' and on node2 do 'fenced -f 6'.  This will give node1 a five
second head-start and it should fence node2 before node2 can fence node1.

> 		<clusternode name="node1" votes="1">
> 			<fence>
> 				<method name="1">
> 					<device name="HPiLO_node2"/>
> 				</method>
> 			</fence>
> 		</clusternode>

> 		<clusternode name="node2" votes="1">
> 			<fence>
> 				<method name="1">
> 					<device name="HPiLO_node1"/>
> 				</method>
> 			</fence>
> 		</clusternode>

> 	<fencedevices>
> 		<fencedevice agent="fence_ilo" hostname="10.10.10.1"
>                login="Administrator" name="HPiLO_node1" passwd="RWE232WE"/>
> 		<fencedevice agent="fence_ilo" hostname="10.10.10.2"
>                 login="Administrator" name="HPiLO_node2" passwd="QWD31D4D"/>
> 	</fencedevices>

I've never configured fence_ilo before, but you may want to check this.
You specify in node A's <fence> section how others will fence node A
(not how node A will fence another node).  So, shouldn't node1 list
HPiLO_node1 as its fence device and node2 list HPiLO_node2?

Dave




More information about the Linux-cluster mailing list