<div dir="ltr"><div>Hi,</div><div><br></div>I would advise you to use quorum disk _only_ as a last resort - it's better to first get a solid understanding of the clustering solution before adding additional complexity.<div>An amazingly thorough and well described tutorial you can find here: <a href="https://alteeve.ca/w/AN!Cluster_Tutorial_2">https://alteeve.ca/w/AN!Cluster_Tutorial_2</a></div><div><br></div><div>Especially useful are the first chapters - the theory.</div><div>What I suspect is happening in your case is that your cluster communication and fencing are over the same network, which is not fault tolerant.</div><div>So what happens if this network fails? Your 2 nodes can't see each other, so they send fence requests, but the fence devices are unreachable too, so those requests fail.</div><div>They are retried a few times I think, but if all fail, the fence agent returns failed and your cluster is stuck in "recovering" or stopped state.</div><div>Other times the network outage is shorter and the fence succeeds, resulting in both nodes going down - this is solved with the delay parameter.</div><div>The first issue is architectural one, it is the expected behavior of the cluster to stop (or "freeze") all resources if it can't guarantee the state of all members.</div><div><br></div><div>Read the article above it's really very useful.</div><div><br></div><div>Cheers!<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 27, 2015 at 9:44 AM, Vijay Kakkar <span dir="ltr"><<a href="mailto:vijaykakkars@gmail.com" target="_blank">vijaykakkars@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">You should look for qdisk now.I hope this will be helpful.<br></div><div class=""><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 27, 2015 at 11:38 AM, Jatin Davey <span dir="ltr"><<a href="mailto:jashokda@cisco.com" target="_blank">jashokda@cisco.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    <font face="Times New Roman">Yes , I did restart it.</font><div><div><br>
    <br>
    <div>On 4/27/2015 11:31 AM, emmanuel segura
      wrote:<br>
    </div>
    <blockquote type="cite">
      <pre>did you restarted the cluster after added the delay parameter?

2015-04-27 7:49 GMT+02:00 Jatin Davey <a href="mailto:jashokda@cisco.com" target="_blank"><jashokda@cisco.com></a>:
</pre>
      <blockquote type="cite">
        <pre>Ok , i tried with delay but it has not helped. I guess i have to try using
quorum disk now.

Thanks
Jatin

On 4/24/2015 7:06 PM, Vijay Kakkar wrote:

You may need to delay the fencing ( delay=seconds ) or use quorum disk if
delaying the fencing doesn't help.

On Fri, Apr 24, 2015 at 6:23 PM, Jatin Davey <a href="mailto:jashokda@cisco.com" target="_blank"><jashokda@cisco.com></a> wrote:
</pre>
        <blockquote type="cite">
          <pre>Here is my cluster.conf file

************************
<?xml version="1.0"?>
<cluster config_version="4" name="****">
        <clusternodes>
                <clusternode name="node-103" nodeid="1">
                        <fence>
                                <method name="Method01">
                                        <device name="node-103"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node-105" nodeid="2">
                        <fence>
                                <method name="Method02">
                                        <device name="node-105"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_ipmilan" auth="password"
ipaddr="x.x.x.x" lanplus="on" login="admin" name="node-103" passwd="*****"
privlvl="ADMINISTRATOR"/>
                <fencedevice agent="fence_ipmilan" auth="password"
ipaddr="x.x.x.x" lanplus="on" login="admin" name="node-105" passwd="******"
privlvl="ADMINISTRATOR"/>
        </fencedevices>
        <fence_daemon post_join_delay="120"/>
        <rm>
                <resources>
                        <netfs export="/test" force_unmount="1"
fstype="nfs" host="x.x.x.x" mountpoint="/test/test/test" name="test123"/>
                        <ip address="x.x.x.x" sleeptime="5"/>
                        <script file="/xxx/xxx/xxx/xxx/xx.sh"
name="xxxx"/>
                </resources>
                <failoverdomains>
                        <failoverdomain name="Failover01" nofailback="1"
ordered="1">
                                <failoverdomainnode name="node-103"
priority="1"/>
                                <failoverdomainnode name="node-105"
priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <service domain="Failover01" name="Service01"
recovery="relocate">
                        <ip ref="x.x.x.x"/>
                        <netfs ref="test123"/>
                        <script ref="xxxx"/>
                </service>
        </rm>
</cluster>


On 4/24/2015 6:01 PM, emmanuel segura wrote:
</pre>
          <blockquote type="cite">
            <pre>please share your cluster config, maybe in this way someone can help you.

2015-04-24 14:12 GMT+02:00 Jatin Davey <a href="mailto:jashokda@cisco.com" target="_blank"><jashokda@cisco.com></a>:
</pre>
            <blockquote type="cite">
              <pre>Hi

I am using a two node cluster using RHEL 6.5. I have a very fundamental
question.

For the two node cluster to work , Is it mandatory that both the nodes
are
"online" and communicating with each other ?

What i can see is that if there is communication failure between them
then
either both the nodes are fenced or the cluster gets into a "stopped"
state
(Seen from output of clustat command).

Apologies if my questions are naive. I am just starting to work with
RHEL
cluster add-on.

Thanks
Jatin

--
Linux-cluster mailing list
<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a>
</pre>
            </blockquote>
            <pre></pre>
          </blockquote>
          <pre>--
Linux-cluster mailing list
<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a>
</pre>
        </blockquote>
        <pre>--
Cheers

Vijay Kakkar - RHC{E,SS,VA,DS,A,I,X}

Techgrills Systems Pvt. Ltd.
011-46521313 | <a href="tel:%2B919999103657" value="+919999103657" target="_blank">+919999103657</a>
<a href="http://www.techgrills.com" target="_blank">http://www.techgrills.com</a>
<a href="http://lnkd.in/bnj2VUU" target="_blank">http://lnkd.in/bnj2VUU</a>




--
Linux-cluster mailing list
<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a>
</pre>
      </blockquote>
      <pre></pre>
    </blockquote>
    <br>
  </div></div></div>

<br>--<br>
Linux-cluster mailing list<br>
<a href="mailto:Linux-cluster@redhat.com" target="_blank">Linux-cluster@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br></blockquote></div><br><br clear="all"><br>-- <br><div><div dir="ltr"><div><div dir="ltr"><div><div>Cheers<br></div><div><br><b>Vijay Kakkar - RHC{E,SS,VA,DS,A,I,X}</b><br><br>Techgrills Systems Pvt. Ltd.<br>011-46521313 | <a href="tel:%2B919999103657" value="+919999103657" target="_blank">+919999103657</a><br></div><div><a href="http://www.techgrills.com" target="_blank">http://www.techgrills.com</a><br></div></div><div><div><a href="http://lnkd.in/bnj2VUU" target="_blank">http://lnkd.in/bnj2VUU</a><br></div></div></div></div></div></div>
</div>
</div></div><br>--<br>
Linux-cluster mailing list<br>
<a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br></blockquote></div><br></div></div></div>