I decided this morning to start checking packages/versions first. Here are some details about the system thus far:<br><br>CONF:<br><?xml version="1.0" ?><br><cluster alias="mrcluster" config_version="2" name="mrcluster"><br>
    <fence_daemon post_fail_delay="0" post_join_delay="30"/><br>    <clusternodes><br>        <clusternode name="<a href="http://clxmrcati12.xxxxxx.com">clxmrcati12.xxxxxx.com</a>" nodeid="1" votes="1"><br>
            <fence><br>                <method name="1"><br>                    <device name="apcps05" option="off" port="3" switch="3"/><br>                    <device name="apcps06" option="off" port="3" switch="3"/><br>
                    <device name="apcps05" option="on" port="3" switch="3"/><br>                    <device name="apcps06" option="on" port="3" switch="3"/><br>
                </method><br>            </fence><br>        </clusternode><br>        <clusternode name="<a href="http://clxmrcati11.xxxxxx.com">clxmrcati11.xxxxxx.com</a>" nodeid="2" votes="1"><br>
            <fence><br>                <method name="1"><br>                    <device name="apcps05" option="off" port="4" switch="4"/><br>                    <device name="apcps06" option="off" port="4" switch="4"/><br>
                    <device name="apcps05" option="on" port="4" switch="4"/><br>                    <device name="apcps06" option="on" port="4" switch="4"/><br>
                </method><br>            </fence><br>        </clusternode><br>        <clusternode name="<a href="http://clxmrweb20.xxxxxx.com">clxmrweb20.xxxxxx.com</a>" nodeid="3" votes="1"><br>
            <fence><br>                <method name="1"><br>                    <device name="apcps05" option="off" port="2" switch="2"/><br>                    <device name="apcps06" option="off" port="2" switch="2"/><br>
                    <device name="apcps05" option="on" port="2" switch="2"/><br>                    <device name="apcps06" option="on" port="2" switch="2"/><br>
                </method><br>            </fence><br>        </clusternode><br>    </clusternodes><br>    <cman/><br>    <fencedevices><br>        <fencedevice agent="fence_apc" ipaddr="172.XX.XX.27" login="apc" name="apcps05" passwd="xxx"/><br>
        <fencedevice agent="fence_apc" ipaddr="172.XX.XX..28" login="apc" name="apcps06" passwd="xxx"/><br>    </fencedevices><br>    <rm><br>        <failoverdomains/><br>
        <resources/><br>    </rm><br></cluster><br>-------------------------------------------------------------------------------------------<br>Host Files:<br>From Luci Node clxmrcati11:<br>127.0.0.1    localhost.localdomain    localhost<br>
172.XX.XX.18    <a href="http://clxmrcati11.xxxxxx.com">clxmrcati11.xxxxxx.com</a>       clxmrcati11<br>172.XX.XX.19    <a href="http://clxmrcati12.xxxxxx.com">clxmrcati12.xxxxxx.com</a>       clxmrcati12<br>172.XX.XX.20    <a href="http://clxmrrpt10.xxxxxx.com">clxmrrpt10.xxxxxx.com</a>         clxmrrpt10<br>
172.XX.XX.21    <a href="http://clxmrweb20.xxxxxx.com">clxmrweb20.xxxxxx.com</a>      clxmrweb20<br><br>From ricci node clxmrcati12:<br>127.0.0.1    localhost.localdomain    localhost<br>172.XX.XX.19    <a href="http://clxmrcati12.maritz.com">clxmrcati12.maritz.com</a>               fenclxmrcati12<br>
172.XX.XX.21    <a href="http://clxmrweb20.maritz.com">clxmrweb20.maritz.com</a>       I decided this morning to start checking packages/versions first. Here are some details about the system thus far:<br>
<br>
CONF:<br>
<?xml version="1.0" ?><br>
<cluster alias="mrcluster" config_version="2" name="mrcluster"><br>
    <fence_daemon post_fail_delay="0" post_join_delay="30"/><br>
    <clusternodes><br>
        <clusternode name="<a href="http://clxmrcati12.xxxxxx.com">clxmrcati12.xxxxxx.com</a>" nodeid="1" votes="1"><br>
            <fence><br>
                <method name="1"><br>
                    <device name="apcps05" option="off" port="3" switch="3"/><br>
                    <device name="apcps06" option="off" port="3" switch="3"/><br>
                    <device name="apcps05" option="on" port="3" switch="3"/><br>
                    <device name="apcps06" option="on" port="3" switch="3"/><br>
                </method><br>
            </fence><br>
        </clusternode><br>
        <clusternode name="<a href="http://clxmrcati11.xxxxxx.com">clxmrcati11.xxxxxx.com</a>" nodeid="2" votes="1"><br>
            <fence><br>
                <method name="1"><br>
                    <device name="apcps05" option="off" port="4" switch="4"/><br>
                    <device name="apcps06" option="off" port="4" switch="4"/><br>
                    <device name="apcps05" option="on" port="4" switch="4"/><br>
                    <device name="apcps06" option="on" port="4" switch="4"/><br>
                </method><br>
            </fence><br>
        </clusternode><br>
        <clusternode name="<a href="http://clxmrweb20.xxxxxx.com">clxmrweb20.xxxxxx.com</a>" nodeid="3" votes="1"><br>
            <fence><br>
                <method name="1"><br>
                    <device name="apcps05" option="off" port="2" switch="2"/><br>
                    <device name="apcps06" option="off" port="2" switch="2"/><br>
                    <device name="apcps05" option="on" port="2" switch="2"/><br>
                    <device name="apcps06" option="on" port="2" switch="2"/><br>
                </method><br>
            </fence><br>
        </clusternode><br>
    </clusternodes><br>
    <cman/><br>
    <fencedevices><br>
        <fencedevice agent="fence_apc" ipaddr="172.XX.XX.27" login="apc" name="apcps05" passwd="xxx"/><br>
        <fencedevice agent="fence_apc" ipaddr="172.XX.XX..28" login="apc" name="apcps06" passwd="xxx"/><br>
    </fencedevices><br>
    <rm><br>
        <failoverdomains/><br>
        <resources/><br>
    </rm><br>
</cluster><br>
-------------------------------------------------------------------------------------------<br>
Host Files:<br>
>From Luci Node clxmrcati11:<br>
127.0.0.1    localhost.localdomain    localhost<br>
172.XX.XX.18    <a href="http://clxmrcati11.xxxxxx.com">clxmrcati11.xxxxxx.com</a>       clxmrcati11<br>
172.XX.XX.19    <a href="http://clxmrcati12.xxxxxx.com">clxmrcati12.xxxxxx.com</a>       clxmrcati12<br>
172.XX.XX.20    <a href="http://clxmrrpt10.xxxxxx.com">clxmrrpt10.xxxxxx.com</a>         clxmrrpt10<br>
172.XX.XX.21    <a href="http://clxmrweb20.xxxxxx.com">clxmrweb20.xxxxxx.com</a>      clxmrweb20<br>
<br>
>From ricci node clxmrcati12:<br>
127.0.0.1    localhost.localdomain    localhost<br>
172.XX.XX.19    <a href="http://clxmrcati12.xxxxxx.com">clxmrcati12.xxxxxx.com</a>               clxmrcati12<br>
172.XX.XX.21    <a href="http://clxmrweb20.xxxxxx.com">clxmrweb20.xxxxxx.com</a>              clxmrweb20<br>
172.XX.XX.20    <a href="http://clxmrrpt10.xxxxxx.com">clxmrrpt10.xxxxxx.com</a>                 clxmrrpt10<br>
172.XX.XX.18    <a href="http://clxmrcati11.xxxxxx.com">clxmrcati11.xxxxxx.com</a>               clxmrcati11<br>
<br>From ricci node clxmrweb20:<br>127.0.0.1    localhost.localdomain    localhost<br>172.XX.XX.21    <a href="http://clxmrweb20.xxxxxx.com">clxmrweb20.xxxxxx.com</a>             clxmrweb20<br>172.XX.XX.20    <a href="http://clxmrrpt10.xxxxxx.com">clxmrrpt10.xxxxxx.com</a>                clxmrrpt10<br>
172.XX.XX.18    <a href="http://clxmrcati11.xxxxxx.com">clxmrcati11.xxxxxx.com</a>              clxmrcati11<br>172.XX.XX.19    <a href="http://clxmrcati12.xxxxxx.com">clxmrcati12.xxxxxx.com</a>              clxmrcati12<br>
<br>Mostly this in /var/log/messages:<br>Aug 25 09:36:12 fenclxmrcati11 dlm_controld[2267]: connect to ccs error -111, check ccsd or cluster status<br>Aug 25 09:36:12 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.  Refusing connection. <br>
Aug 25 09:36:12 fenclxmrcati11 ccsd[3758]: Error while processing connect: Connection refused <br>Aug 25 09:36:12 fenclxmrcati11 gfs_controld[2273]: connect to ccs error -111, check ccsd or cluster status<br>Aug 25 09:36:12 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.  Refusing connection. <br>
Aug 25 09:36:12 fenclxmrcati11 ccsd[3758]: Error while processing connect: Connection refused <br>Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.  Refusing connection. <br>Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Error while processing connect: Connection refused <br>
Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.  Refusing connection. <br>Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Error while processing connect: Connection refused <br>Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.  Refusing connection. <br>
Aug 25 09:36:13 fenclxmrcati11 ccsd[3758]: Error while processing connect: Connection refused <br>Aug 25 09:36:14 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.  Refusing connection. <br>Aug 25 09:36:14 fenclxmrcati11 ccsd[3758]: Error while processing connect: Connection refused <br>
Aug 25 09:36:14 fenclxmrcati11 ccsd[3758]: Cluster is not quorate.  Refusing connection. <br>Aug 25 09:36:14 fenclxmrcati11 ccsd[3758]: Error while processing connect: Connection re<br><br>
<br><br><div class="gmail_quote">On Thu, Aug 27, 2009 at 3:27 AM, Jakov Sosic <span dir="ltr"><<a href="mailto:jakov.sosic@srce.hr">jakov.sosic@srce.hr</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Wed, 26 Aug 2009 18:36:26 -0500<br>
<div class="im">Alan A <<a href="mailto:alan.zg@gmail.com">alan.zg@gmail.com</a>> wrote:<br>
<br>
</div><div><div class="h5">> I have tried almost everything at this point to try and troubleshoot<br>
> this further. I can't create new cluster with luci.<br>
></div></div></blockquote></div>         fenclxmrweb20<br>172.XX.XX.20    <a href="http://clxmrrpt10.maritz.com">clxmrrpt10.maritz.com</a>                fenclxmrrpt10<br>172.XX.XX.18    clxmrcati11..com               clxmrcati11<br>
<br><br><br><br><br><div class="gmail_quote">On Thu, Aug 27, 2009 at 3:27 AM, Jakov Sosic <span dir="ltr"><<a href="mailto:jakov.sosic@srce.hr">jakov.sosic@srce.hr</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Wed, 26 Aug 2009 18:36:26 -0500<br>
<div class="im">Alan A <<a href="mailto:alan.zg@gmail.com">alan.zg@gmail.com</a>> wrote:<br>
<br>
</div><div><div></div><div class="h5">> I have tried almost everything at this point to try and troubleshoot<br>
> this further. I can't create new cluster with luci.<br>
><br>
> I broke and tried to reconfigure 3 node cluster at least 6 times.<br>
><br>
> I have noticed nodes taking expectational long on initializing<br>
> fencing upon cman start. I tried with defined and undefined fencing,<br>
> the amount of time needed is still the same. Even after the fencing<br>
> is overcome in /var/log/messages nodes refuse to join cluster due to<br>
> the state of 'not in quorum' during joining process. I uped the<br>
> post_join_delay as much as 150 but the result is the same.<br>
><br>
> Fencing - I use APC PW Switches - I can login into apc PWS from the<br>
> node, I can even fence the other node, but when cman is started it<br>
> looks like it is almost timign out on staring fencing.<br>
><br>
> If I issue cman_tool nodes it gives me the local node name as the<br>
> member of the cluster and the other two with state 'X'. If I try<br>
> cman_tool join clustername - it tells me the nodes are already in<br>
> that cluster but cluster as the whole does not register. Each node<br>
> thinks it's the only working member of the cluster.<br>
><br>
><br>
> Any pointers?<br>
<br>
</div></div>Looks like network issue to me.<br>
<br>
Are you sure your network is operational in a sense of a multicast /<br>
igmp? Try forcing igmp v1 in sysctl.conf - and if you have Cisco<br>
equipment take a look at openais FAQ (mode sparse-dense).<br>
<font color="#888888"><br>
<br>
--<br>
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |<br>
=================================================================<br>
| start fighting cancer -> <a href="http://www.worldcommunitygrid.org/" target="_blank">http://www.worldcommunitygrid.org/</a>   |<br>
</font><div><div></div><div class="h5"><br>
--<br>
Linux-cluster mailing list<br>
<a href="mailto:Linux-cluster@redhat.com">Linux-cluster@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>Alan A.<br>