[Linux-cluster] fence start-up issue
sghosh at redhat.com
Fri Sep 12 23:39:33 UTC 2008
Eric Ritchie wrote:
> I sometimes run into an issue when a node in my 2-node cluster is
> rebooting and hangs on fenced. It seems it can't communicate with the
> other node and after the post_join_delay, it fences the other node. This
> happened again today, and when the second node rebooted after the fence,
> they were in a split-brain configuration.
> I saw in the cluster faq, in the cman section, question 6 that the
> cluster communication network should be the same network as the fencing
> device. I think this may be my problem but I don't understand why. I'm
> using HP iLo for fencing and I setup cross-connect cables for the
> cluster communication between the 2 nodes. Why would having cluster
> communication and fencing on different networks be an issue?
> Thanks for your time
Having distinct heartbeat and fencing networks creates the possibility of race
condition, which you seem to be running into.
The cluster communication may not have stabilized in the post_join_delay time
frame due to any number of issues including network outage. In this case
fencing would fail from the node starting up as it is the same path to fence
device as to cluster member.
By separating the two - fence can succeed while cluster communication fails.
Recommendation would be for cluster communication and iLO reachability to be
through the same NIC on the host.
More information about the Linux-cluster