<div dir="ltr"><div>Hi,</div><div><br></div>I would advise you to use quorum disk _only_ as a last resort - it's better to first get a solid understanding of the clustering solution before adding additional complexity.<div>An amazingly thorough and well described tutorial you can find here: <a href="https://alteeve.ca/w/AN!Cluster_Tutorial_2">https://alteeve.ca/w/AN!Cluster_Tutorial_2</a></div><div><br></div><div>Especially useful are the first chapters - the theory.</div><div>What I suspect is happening in your case is that your cluster communication and fencing are over the same network, which is not fault tolerant.</div><div>So what happens if this network fails? Your 2 nodes can't see each other, so they send fence requests, but the fence devices are unreachable too, so those requests fail.</div><div>They are retried a few times I think, but if all fail, the fence agent returns failed and your cluster is stuck in "recovering" or stopped state.</div><div>Other times the network outage is shorter and the fence succeeds, resulting in both nodes going down - this is solved with the delay parameter.</div><div>The first issue is architectural one, it is the expected behavior of the cluster to stop (or "freeze") all resources if it can't guarantee the state of all members.</div><div><br></div><div>Read the article above it's really very useful.</div><div><br></div><div>Cheers!<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 27, 2015 at 9:44 AM, Vijay Kakkar <span dir="ltr"><<a href="mailto:vijaykakkars@gmail.com" target="_blank">vijaykakkars@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">You should look for qdisk now.I hope this will be helpful.<br></div><div class=""><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 27, 2015 at 11:38 AM, Jatin Davey <span dir="ltr"><<a href="mailto:jashokda@cisco.com" target="_blank">jashokda@cisco.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<font face="Times New Roman">Yes , I did restart it.</font><div><div><br>
<br>
<div>On 4/27/2015 11:31 AM, emmanuel segura
wrote:<br>
</div>
<blockquote type="cite">
<pre>did you restarted the cluster after added the delay parameter?
2015-04-27 7:49 GMT+02:00 Jatin Davey <a href="mailto:jashokda@cisco.com" target="_blank"><jashokda@cisco.com></a>:
</pre>
<blockquote type="cite">
<pre>Ok , i tried with delay but it has not helped. I guess i have to try using
quorum disk now.
Thanks
Jatin
On 4/24/2015 7:06 PM, Vijay Kakkar wrote:
You may need to delay the fencing ( delay=seconds ) or use quorum disk if
delaying the fencing doesn't help.
On Fri, Apr 24, 2015 at 6:23 PM, Jatin Davey <a href="mailto:jashokda@cisco.com" target="_blank"><jashokda@cisco.com></a> wrote:
</pre>
<blockquote type="cite">
<pre>Here is my cluster.conf file
************************
<?xml version="1.0"?>
<cluster config_version="4" name="****">
<clusternodes>
<clusternode name="node-103" nodeid="1">
<fence>
<method name="Method01">
<device name="node-103"/>
</method>
</fence>
</clusternode>
<clusternode name="node-105" nodeid="2">
<fence>
<method name="Method02">
<device name="node-105"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="password"
ipaddr="x.x.x.x" lanplus="on" login="admin" name="node-103" passwd="*****"
privlvl="ADMINISTRATOR"/>
<fencedevice agent="fence_ipmilan" auth="password"
ipaddr="x.x.x.x" lanplus="on" login="admin" name="node-105" passwd="******"
privlvl="ADMINISTRATOR"/>
</fencedevices>
<fence_daemon post_join_delay="120"/>
<rm>
<resources>
<netfs export="/test" force_unmount="1"
fstype="nfs" host="x.x.x.x" mountpoint="/test/test/test" name="test123"/>
<ip address="x.x.x.x" sleeptime="5"/>
<script file="/xxx/xxx/xxx/xxx/xx.sh"
name="xxxx"/>
</resources>
<failoverdomains>
<failoverdomain name="Failover01" nofailback="1"
ordered="1">
<failoverdomainnode name="node-103"
priority="1"/>
<failoverdomainnode name="node-105"
priority="2"/>
</failoverdomain>
</failoverdomains>
<service domain="Failover01" name="Service01"
recovery="relocate">
<ip ref="x.x.x.x"/>
<netfs ref="test123"/>
<script ref="xxxx"/>
</service>
</rm>
</cluster>
On 4/24/2015 6:01 PM, emmanuel segura wrote:
</pre>
<blockquote type="cite">
<pre>please share your cluster config, maybe in this way someone can help you.
2015-04-24 14:12 GMT+02:00 Jatin Davey <a href="mailto:jashokda@cisco.com" target="_blank"><jashokda@cisco.com></a>:
</pre>
<blockquote type="cite">
<pre>Hi
I am using a two node cluster using RHEL 6.5. I have a very fundamental
question.
For the two node cluster to work , Is it mandatory that both the nodes
are
"online" and communicating with each other ?
What i can see is that if there is communication failure between them
then
either both the nodes are fenced or the cluster gets into a "stopped"
state
(Seen from output of clustat command).
Apologies if my questions are naive. I am just starting to work with
RHEL
cluster add-on.
Thanks
Jatin
--
Linux-cluster mailing list
<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a>
</pre>
</blockquote>
<pre></pre>
</blockquote>
<pre>--
Linux-cluster mailing list
<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a>
</pre>
</blockquote>
<pre>--
Cheers
Vijay Kakkar - RHC{E,SS,VA,DS,A,I,X}
Techgrills Systems Pvt. Ltd.
011-46521313 | <a href="tel:%2B919999103657" value="+919999103657" target="_blank">+919999103657</a>
<a href="http://www.techgrills.com" target="_blank">http://www.techgrills.com</a>
<a href="http://lnkd.in/bnj2VUU" target="_blank">http://lnkd.in/bnj2VUU</a>
--
Linux-cluster mailing list
<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a>
</pre>
</blockquote>
<pre></pre>
</blockquote>
<br>
</div></div></div>
<br>--<br>
Linux-cluster mailing list<br>
<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br></blockquote></div><br><br clear="all"><br>-- <br><div><div dir="ltr"><div><div dir="ltr"><div><div>Cheers<br></div><div><br><b>Vijay Kakkar - RHC{E,SS,VA,DS,A,I,X}</b><br><br>Techgrills Systems Pvt. Ltd.<br>011-46521313 | <a href="tel:%2B919999103657" value="+919999103657" target="_blank">+919999103657</a><br></div><div><a href="http://www.techgrills.com" target="_blank">http://www.techgrills.com</a><br></div></div><div><div><a href="http://lnkd.in/bnj2VUU" target="_blank">http://lnkd.in/bnj2VUU</a><br></div></div></div></div></div></div>
</div>
</div></div><br>--<br>
Linux-cluster mailing list<br>
<a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br></blockquote></div><br></div></div></div>