[Linux-cluster] 2-node cluster fence loop
lists at alteeve.ca
Wed Jun 11 15:03:48 UTC 2014
On 11/06/14 10:48 AM, Arun G Nair wrote:
> What are the reasons for fence loops when only cman is started ? We
> have an RHEL 6.5 2-node cluster which goes in to a fence loop and every
> time we start cman on both nodes. Either one fences the other. Multicast
> seems to be working properly. My understanding is that without rgmanager
> running there won't be a multicast group subscription ? I don't see the
> multicast address in 'netstat -g' unless rgmanager is running. I've
> tried to increase the fence post_join_delay but one of the nodes still
> gets fenced.
> The cluster works fine if we use unicast UDP.
When cman starts, it waits post_join_delay seconds for the peer to
connect. If, after that time expires (6 seconds by default, iirc), it
gives up and calls a fence against the peer to put it into a known state.
Corosync is what determines membership, and it is started by cman.
The rgmanager only handles resource start/stop/relocate/recovery and has
nothing to do with fencing directly. Corosync is what uses multicast.
So as you seem to have already surmised, multicast is probably not
working in your environment. Have you enabled multicast traffic on the
firewall? Do your switches support multicast properly?
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Linux-cluster