[Linux-cluster] 2-node cluster fence loop

Kaloyan Kovachev kkovachev at varna.net
Thu Jun 12 14:43:06 UTC 2014


Do you have a different auth key on each node by any chance?

On 2014-06-12 17:29, Arun G Nair wrote:

> We have multicast enabled on the switch. I've also tried the 
> multicast.py tool from RH's knowledge base to test multicast and I see 
> the expected output, though the tool uses a different multicast IP( 
> guess that shouldn't matter). I've tried increasing the post_join_delay 
> to 360 seconds to give me enough time to check everything on both the 
> nodes. One node still gets fenced. `clustat` output says the other node 
> is offline on both servers. So one node can't see the other one ? This 
> again points to issue with multicast. Any other clues as to what/where 
> to look ?
> 
> On Wed, Jun 11, 2014 at 8:33 PM, Digimer <lists at alteeve.ca> wrote:
> 
> On 11/06/14 10:48 AM, Arun G Nair wrote:
> Hello,
> 
> What are the reasons for fence loops when only cman is started ? We
> have an RHEL 6.5 2-node cluster which goes in to a fence loop and every
> time we start cman on both nodes. Either one fences the other. 
> Multicast
> seems to be working properly. My understanding is that without 
> rgmanager
> running there won't be a multicast group subscription ? I don't see the
> multicast address in 'netstat -g' unless rgmanager is running. I've
> tried to increase the fence post_join_delay but one of the nodes still
> gets fenced.
> 
> The cluster works fine if we use unicast UDP.
> 
> Thanks, Hi,
> 
> When cman starts, it waits post_join_delay seconds for the peer to 
> connect. If, after that time expires (6 seconds by default, iirc), it 
> gives up and calls a fence against the peer to put it into a known 
> state.
> 
> Corosync is what determines membership, and it is started by cman. The 
> rgmanager only handles resource start/stop/relocate/recovery and has 
> nothing to do with fencing directly. Corosync is what uses multicast.
> 
> So as you seem to have already surmised, multicast is probably not 
> working in your environment. Have you enabled multicast traffic on the 
> firewall? Do your switches support multicast properly?
> 
> digimer
> 
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/ [1]
> What if the cure for cancer is trapped in the mind of a person without 
> access to education?
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster [2]

-- 
Arun G Nair
Sr. Sysadmin
Dimension Data | Ph: (800) 664-9973
Feedback? We're listening [3]



Links:
------
[1] https://alteeve.ca/w/
[2] https://www.redhat.com/mailman/listinfo/linux-cluster
[3] http://www.surveymonkey.com/s/XRCYXBH




More information about the Linux-cluster mailing list