<div dir="ltr"><div>Hi,</div><div><br></div>I would advise you to use quorum disk _only_ as a last resort - it's better to first get a solid understanding of the clustering solution before adding additional complexity.<div>An amazingly thorough and well described tutorial you can find here: <a href="https://alteeve.ca/w/AN!Cluster_Tutorial_2">https://alteeve.ca/w/AN!Cluster_Tutorial_2</a></div><div><br></div><div>Especially useful are the first chapters - the theory.</div><div>What I suspect is happening in your case is that your cluster communication and fencing are over the same network, which is not fault tolerant.</div><div>So what happens if this network fails? Your 2 nodes can't see each other, so they send fence requests, but the fence devices are unreachable too, so those requests fail.</div><div>They are retried a few times I think, but if all fail, the fence agent returns failed and your cluster is stuck in "recovering" or stopped state.</div><div>Other times the network outage is shorter and the fence succeeds, resulting in both nodes going down - this is solved with the delay parameter.</div><div>The first issue is architectural one, it is the expected behavior of the cluster to stop (or "freeze") all resources if it can't guarantee the state of all members.</div><div><br></div><div>Read the article above it's really very useful.</div><div><br></div><div>Cheers!<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 27, 2015 at 9:44 AM, Vijay Kakkar <span dir="ltr"><<a href="mailto:vijaykakkars@gmail.com" target="_blank">vijaykakkars@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">You should look for qdisk now.I hope this will be helpful.<br></div><div class=""><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 27, 2015 at 11:38 AM, Jatin Davey <span dir="ltr"><<a href="mailto:jashokda@cisco.com" target="_blank">jashokda@cisco.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div bgcolor="#FFFFFF" text="#000000"> <font face="Times New Roman">Yes , I did restart it.</font><div><div><br> <br> <div>On 4/27/2015 11:31 AM, emmanuel segura wrote:<br> </div> <blockquote type="cite"> <pre>did you restarted the cluster after added the delay parameter? 2015-04-27 7:49 GMT+02:00 Jatin Davey <a href="mailto:jashokda@cisco.com" target="_blank"><jashokda@cisco.com></a>: </pre> <blockquote type="cite"> <pre>Ok , i tried with delay but it has not helped. I guess i have to try using quorum disk now. Thanks Jatin On 4/24/2015 7:06 PM, Vijay Kakkar wrote: You may need to delay the fencing ( delay=seconds ) or use quorum disk if delaying the fencing doesn't help. On Fri, Apr 24, 2015 at 6:23 PM, Jatin Davey <a href="mailto:jashokda@cisco.com" target="_blank"><jashokda@cisco.com></a> wrote: </pre> <blockquote type="cite"> <pre>Here is my cluster.conf file ************************ <?xml version="1.0"?> <cluster config_version="4" name="****"> <clusternodes> <clusternode name="node-103" nodeid="1"> <fence> <method name="Method01"> <device name="node-103"/> </method> </fence> </clusternode> <clusternode name="node-105" nodeid="2"> <fence> <method name="Method02"> <device name="node-105"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_ipmilan" auth="password" ipaddr="x.x.x.x" lanplus="on" login="admin" name="node-103" passwd="*****" privlvl="ADMINISTRATOR"/> <fencedevice agent="fence_ipmilan" auth="password" ipaddr="x.x.x.x" lanplus="on" login="admin" name="node-105" passwd="******" privlvl="ADMINISTRATOR"/> </fencedevices> <fence_daemon post_join_delay="120"/> <rm> <resources> <netfs export="/test" force_unmount="1" fstype="nfs" host="x.x.x.x" mountpoint="/test/test/test" name="test123"/> <ip address="x.x.x.x" sleeptime="5"/> <script file="/xxx/xxx/xxx/xxx/xx.sh" name="xxxx"/> </resources> <failoverdomains> <failoverdomain name="Failover01" nofailback="1" ordered="1"> <failoverdomainnode name="node-103" priority="1"/> <failoverdomainnode name="node-105" priority="2"/> </failoverdomain> </failoverdomains> <service domain="Failover01" name="Service01" recovery="relocate"> <ip ref="x.x.x.x"/> <netfs ref="test123"/> <script ref="xxxx"/> </service> </rm> </cluster> On 4/24/2015 6:01 PM, emmanuel segura wrote: </pre> <blockquote type="cite"> <pre>please share your cluster config, maybe in this way someone can help you. 2015-04-24 14:12 GMT+02:00 Jatin Davey <a href="mailto:jashokda@cisco.com" target="_blank"><jashokda@cisco.com></a>: </pre> <blockquote type="cite"> <pre>Hi I am using a two node cluster using RHEL 6.5. I have a very fundamental question. For the two node cluster to work , Is it mandatory that both the nodes are "online" and communicating with each other ? What i can see is that if there is communication failure between them then either both the nodes are fenced or the cluster gets into a "stopped" state (Seen from output of clustat command). Apologies if my questions are naive. I am just starting to work with RHEL cluster add-on. Thanks Jatin -- Linux-cluster mailing list <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a> </pre> </blockquote> <pre></pre> </blockquote> <pre>-- Linux-cluster mailing list <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a> </pre> </blockquote> <pre>-- Cheers Vijay Kakkar - RHC{E,SS,VA,DS,A,I,X} Techgrills Systems Pvt. Ltd. 011-46521313 | <a href="tel:%2B919999103657" value="+919999103657" target="_blank">+919999103657</a> <a href="http://www.techgrills.com" target="_blank">http://www.techgrills.com</a> <a href="http://lnkd.in/bnj2VUU" target="_blank">http://lnkd.in/bnj2VUU</a> -- Linux-cluster mailing list <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a> </pre> </blockquote> <pre></pre> </blockquote> <br> </div></div></div> <br>--<br> Linux-cluster mailing list<br> <a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a><br> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br></blockquote></div><br><br clear="all"><br>-- <br><div><div dir="ltr"><div><div dir="ltr"><div><div>Cheers<br></div><div><br><b>Vijay Kakkar - RHC{E,SS,VA,DS,A,I,X}</b><br><br>Techgrills Systems Pvt. Ltd.<br>011-46521313 | <a href="tel:%2B919999103657" value="+919999103657" target="_blank">+919999103657</a><br></div><div><a href="http://www.techgrills.com" target="_blank">http://www.techgrills.com</a><br></div></div><div><div><a href="http://lnkd.in/bnj2VUU" target="_blank">http://lnkd.in/bnj2VUU</a><br></div></div></div></div></div></div> </div> </div></div><br>--<br> Linux-cluster mailing list<br> <a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br> <a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br></blockquote></div><br></div></div></div>