[Linux-cluster] Working of a two-node cluster

Mon Apr 27 08:18:57 UTC 2015

On 4/27/2015 1:28 PM, Vasil Valchev wrote:
> Hi,
>
> I would advise you to use quorum disk _only_ as a last resort - it's 
> better to first get a solid understanding of the clustering solution 
> before adding additional complexity.
> An amazingly thorough and well described tutorial you can find here: 
> https://alteeve.ca/w/AN!Cluster_Tutorial_2 
> <https://alteeve.ca/w/AN%21Cluster_Tutorial_2>
[Jatin] Thank you very much for sharing this tutorial. I will surely go 
through it and gain more understanding.
>
> Especially useful are the first chapters - the theory.
> What I suspect is happening in your case is that your cluster 
> communication and fencing are over the same network, which is not 
> fault tolerant.
[Jatin]
My cluster communication happens over one network while fencing happens 
over other network. I use two seperate vlans for this purpose. Secondly 
when the cluster communication fails due to network outage then fencing 
happens over the other vlan and both the nodes get fenced.
> So what happens if this network fails? Your 2 nodes can't see each 
> other, so they send fence requests, but the fence devices are 
> unreachable too, so those requests fail.
> They are retried a few times I think, but if all fail, the fence agent 
> returns failed and your cluster is stuck in "recovering" or stopped state.
> Other times the network outage is shorter and the fence succeeds, 
> resulting in both nodes going down - this is solved with the delay 
> parameter.
> The first issue is architectural one, it is the expected behavior of 
> the cluster to stop (or "freeze") all resources if it can't guarantee 
> the state of all members.
>
> Read the article above it's really very useful.
>
> Cheers!

Thanks
Jatin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20150427/a03af828/attachment.htm>