[Linux-cluster] Trouble adding back in an old node

Vernard C. Martin vcmarti at sph.emory.edu
Fri Jan 23 22:00:57 UTC 2009


I'm running Centos 5.2 and using the the cluster suite + GFS1. I have an 
EMC CX600 providing shared storage to some LUNs. Im using broacde port 
fencing.

I'm experiencing a problem trying to add a previously removed node back 
into the cluster. The node was having hardare RAM issues so it was 
removed from the cluster completely (i.e. removed from the cluster.conf 
and removed from the storage zoning as well).  I then added 3 more nodes 
to the cluster. Now that the bad RAM has been identified and removed, I 
wanted to add the node back in. I followed the instructions that I had 
used on the previous 3 nodes (i.e. used system-config-cluster to 
configure the node, save and propagate the cluster.conf, manually 
propagate the cluster.conf to the newly added node, and then start up 
cman and clvmd). However when I tried to start up cman with "service 
cman start". The process hangs when actually starting up cman. I did 
some digging and in the /var/log/messages of the node I'm attempting to 
add, I get the following:

Jan 23 15:41:39 node004 ccsd[9342]: Initial status:: Inquorate
Jan 23 15:41:40 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
connection.
Jan 23 15:41:40 node004 ccsd[9342]: Error while processing connect: 
Connection refused
Jan 23 15:41:45 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
connection.
Jan 23 15:41:45 node004 ccsd[9342]: Error while processing connect: 
Connection refused
Jan 23 15:41:50 node004 ccsd[9342]: Cluster is not quorate.  Refusing 
connection.
Jan 23 15:41:50 node004 ccsd[9342]: Error while processing connect: 
Connection refused

I suspect that this is at least part of the problem. However, I'm a bit 
confused because the cluster its attempting to join is most definitely 
quorate.  At least according to clustat -f

Cluster Status for rsph_centos_5 @ Fri Jan 23 17:00:45 2009
Member Status: Quorate

 Member Name                                                  ID   Status
 ------ ----                                                  ---- ------
 head1.clus.sph.emory.edu                                         1 
Online, Local
 node002.clus.sph.emory.edu                                       2 Online
 node003.clus.sph.emory.edu                                       3 Online
 node004.clus.sph.emory.edu                                       4 Offline
 node005.clus.sph.emory.edu                                       5 Online
 node006.clus.sph.emory.edu                                       6 Online
 node007.clus.sph.emory.edu                                       7 Online


I'm thinking that there is something subtlet that I am missing that I 
can change to make this work. I really don't want to have to re-install 
and reconfigure the machine to get this to work. That is something that 
you do in the Windows world :-)


So here is my cluster.conf file. Passwords changed to protect the guilty.

<?xml version="2.0"?>
<cluster alias="rsph_centos_5" config_version="41" name="rsph_centos_5">
        <fence_daemon clean_start="1" post_fail_delay="30" 
post_join_delay="90"/>
        <clusternodes>
                <clusternode name="head1.clus.sph.emory.edu" nodeid="1" 
votes="7">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="1"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node002.clus.sph.emory.edu" 
nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="2"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="2"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node003.clus.sph.emory.edu" 
nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="3"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="3"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node005.clus.sph.emory.edu" 
nodeid="5" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="5"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="5"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node006.clus.sph.emory.edu" 
nodeid="6" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="6"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="6"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node007.clus.sph.emory.edu" 
nodeid="7" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="7"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="7"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node004.clus.sph.emory.edu" 
nodeid="4" votes="1">
                        <fence>
                                <method name="1">
                                        <device 
name="sanclusa1.sph.emory.edu" port="4"/>
                                        <device 
name="sanclusb1.sph.emory.edu" port="4"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_brocade" 
ipaddr="170.140.183.87" login="admin" name="sanclusa1.sph.emory.edu" 
passwd="mypasshere"/>
                <fencedevice agent="fence_brocade" 
ipaddr="170.140.183.88" login="admin" name="sanclusb1.sph.emory.edu" 
passwd="mypasshere"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
</cluster>




More information about the Linux-cluster mailing list