[Linux-cluster] Trouble adding back in an old node

Stewart Walters stewart at epits.com.au
Tue Jan 27 10:18:19 UTC 2009


Vernard C. Martin wrote:
> I'm running Centos 5.2 and using the the cluster suite + GFS1. I have 
> an EMC CX600 providing shared storage to some LUNs. Im using broacde 
> port fencing.
>
> I'm experiencing a problem trying to add a previously removed node 
> back into the cluster. The node was having hardare RAM issues so it 
> was removed from the cluster completely (i.e. removed from the 
> cluster.conf and removed from the storage zoning as well).  I then 
> added 3 more nodes to the cluster. Now that the bad RAM has been 
> identified and removed, I wanted to add the node back in. I followed 
> the instructions that I had used on the previous 3 nodes (i.e. used 
> system-config-cluster to configure the node, save and propagate the 
> cluster.conf, manually propagate the cluster.conf to the newly added 
> node, and then start up cman and clvmd). However when I tried to start 
> up cman with "service cman start". The process hangs when actually 
> starting up cman. I did some digging and in the /var/log/messages of 
> the node I'm attempting to add, I get the following:
>
> Jan 23 15:41:39 node004 ccsd[9342]: Initial status:: Inquorate
> Jan 23 15:41:40 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
> connection.
> Jan 23 15:41:40 node004 ccsd[9342]: Error while processing connect: 
> Connection refused
> Jan 23 15:41:45 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
> connection.
> Jan 23 15:41:45 node004 ccsd[9342]: Error while processing connect: 
> Connection refused
> Jan 23 15:41:50 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
> connection.
> Jan 23 15:41:50 node004 ccsd[9342]: Error while processing connect: 
> Connection refused
>
> I suspect that this is at least part of the problem. However, I'm a 
> bit confused because the cluster its attempting to join is most 
> definitely quorate.  At least according to clustat -f
>
> Cluster Status for rsph_centos_5 @ Fri Jan 23 17:00:45 2009
> Member Status: Quorate
>
> Member Name                                                  ID   Status
> ------ ----                                                  ---- ------
> head1.clus.sph.emory.edu                                         1 
> Online, Local
> node002.clus.sph.emory.edu                                       2 Online
> node003.clus.sph.emory.edu                                       3 Online
> node004.clus.sph.emory.edu                                       4 
> Offline
> node005.clus.sph.emory.edu                                       5 Online
> node006.clus.sph.emory.edu                                       6 Online
> node007.clus.sph.emory.edu                                       7 Online
>
>
> I'm thinking that there is something subtlet that I am missing that I 
> can change to make this work. I really don't want to have to 
> re-install and reconfigure the machine to get this to work. That is 
> something that you do in the Windows world :-)
>
>
> So here is my cluster.conf file. Passwords changed to protect the guilty.
>
> <?xml version="2.0"?>
> <cluster alias="rsph_centos_5" config_version="41" name="rsph_centos_5">
>        <fence_daemon clean_start="1" post_fail_delay="30" 
> post_join_delay="90"/>
>        <clusternodes>
>                <clusternode name="head1.clus.sph.emory.edu" nodeid="1" 
> votes="7">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="1"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="1"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node002.clus.sph.emory.edu" 
> nodeid="2" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="2"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="2"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node003.clus.sph.emory.edu" 
> nodeid="3" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="3"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="3"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node005.clus.sph.emory.edu" 
> nodeid="5" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="5"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="5"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node006.clus.sph.emory.edu" 
> nodeid="6" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="6"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="6"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node007.clus.sph.emory.edu" 
> nodeid="7" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="7"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="7"/>
>                                </method>
>                        </fence>
>                </clusternode>
>                <clusternode name="node004.clus.sph.emory.edu" 
> nodeid="4" votes="1">
>                        <fence>
>                                <method name="1">
>                                        <device 
> name="sanclusa1.sph.emory.edu" port="4"/>
>                                        <device 
> name="sanclusb1.sph.emory.edu" port="4"/>
>                                </method>
>                        </fence>
>                </clusternode>
>        </clusternodes>
>        <cman/>
>        <fencedevices>
>                <fencedevice agent="fence_brocade" 
> ipaddr="170.140.183.87" login="admin" name="sanclusa1.sph.emory.edu" 
> passwd="mypasshere"/>
>                <fencedevice agent="fence_brocade" 
> ipaddr="170.140.183.88" login="admin" name="sanclusb1.sph.emory.edu" 
> passwd="mypasshere"/>
>        </fencedevices>
>        <rm>
>                <failoverdomains/>
>                <resources/>
>        </rm>
> </cluster>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


You have a <cman/> to close the cman stanza in cluster.conf, but no 
actual <cman parameter1=1 parameter2=2> to open it.  Is this correct?

The cman stanza is where you would define expected_votes on the cluster, 
so not having this present is perhaps the reason why ccsd believes the 
cluster is inquorate?

Regards,

Stewart




More information about the Linux-cluster mailing list