<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 4/27/2015 1:28 PM, Vasil Valchev wrote:<br>
<blockquote
cite="mid:CAFZxf=L6cbpiC-pw1cKLD=PQ5iraOmR2wq1JCgj9HwaZFtoYxQ@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<div dir="ltr">
<div>Hi,</div>
<div><br>
</div>
I would advise you to use quorum disk _only_ as a last resort -
it's better to first get a solid understanding of the clustering
solution before adding additional complexity.
<div>An amazingly thorough and well described tutorial you can
find here: <a moz-do-not-send="true"
href="https://alteeve.ca/w/AN%21Cluster_Tutorial_2">https://alteeve.ca/w/AN!Cluster_Tutorial_2</a></div>
</div>
</blockquote>
[Jatin] Thank you very much for sharing this tutorial. I will surely
go through it and gain more understanding.<br>
<blockquote
cite="mid:CAFZxf=L6cbpiC-pw1cKLD=PQ5iraOmR2wq1JCgj9HwaZFtoYxQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>Especially useful are the first chapters - the theory.</div>
<div>What I suspect is happening in your case is that your
cluster communication and fencing are over the same network,
which is not fault tolerant.</div>
</div>
</blockquote>
[Jatin] <br>
My cluster communication happens over one network while fencing
happens over other network. I use two seperate vlans for this
purpose. Secondly when the cluster communication fails due to
network outage then fencing happens over the other vlan and both the
nodes get fenced.<br>
<blockquote
cite="mid:CAFZxf=L6cbpiC-pw1cKLD=PQ5iraOmR2wq1JCgj9HwaZFtoYxQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>So what happens if this network fails? Your 2 nodes can't
see each other, so they send fence requests, but the fence
devices are unreachable too, so those requests fail.</div>
<div>They are retried a few times I think, but if all fail, the
fence agent returns failed and your cluster is stuck in
"recovering" or stopped state.</div>
<div>Other times the network outage is shorter and the fence
succeeds, resulting in both nodes going down - this is solved
with the delay parameter.</div>
<div>The first issue is architectural one, it is the expected
behavior of the cluster to stop (or "freeze") all resources if
it can't guarantee the state of all members.</div>
<div><br>
</div>
<div>Read the article above it's really very useful.</div>
<div><br>
</div>
<div>Cheers!<br>
</div>
</div>
</blockquote>
<br>
Thanks<br>
Jatin<br>
<br>
</body>
</html>