[Linux-cluster] Working of a two-node cluster
jashokda at cisco.com
Mon Apr 27 08:18:57 UTC 2015
On 4/27/2015 1:28 PM, Vasil Valchev wrote:
> I would advise you to use quorum disk _only_ as a last resort - it's
> better to first get a solid understanding of the clustering solution
> before adding additional complexity.
> An amazingly thorough and well described tutorial you can find here:
[Jatin] Thank you very much for sharing this tutorial. I will surely go
through it and gain more understanding.
> Especially useful are the first chapters - the theory.
> What I suspect is happening in your case is that your cluster
> communication and fencing are over the same network, which is not
> fault tolerant.
My cluster communication happens over one network while fencing happens
over other network. I use two seperate vlans for this purpose. Secondly
when the cluster communication fails due to network outage then fencing
happens over the other vlan and both the nodes get fenced.
> So what happens if this network fails? Your 2 nodes can't see each
> other, so they send fence requests, but the fence devices are
> unreachable too, so those requests fail.
> They are retried a few times I think, but if all fail, the fence agent
> returns failed and your cluster is stuck in "recovering" or stopped state.
> Other times the network outage is shorter and the fence succeeds,
> resulting in both nodes going down - this is solved with the delay
> The first issue is architectural one, it is the expected behavior of
> the cluster to stop (or "freeze") all resources if it can't guarantee
> the state of all members.
> Read the article above it's really very useful.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Linux-cluster