[Linux-cluster] Trouble adding back in an old node
Stewart Walters
stewart at epits.com.au
Tue Jan 27 10:18:19 UTC 2009
Vernard C. Martin wrote:
> I'm running Centos 5.2 and using the the cluster suite + GFS1. I have
> an EMC CX600 providing shared storage to some LUNs. Im using broacde
> port fencing.
>
> I'm experiencing a problem trying to add a previously removed node
> back into the cluster. The node was having hardare RAM issues so it
> was removed from the cluster completely (i.e. removed from the
> cluster.conf and removed from the storage zoning as well). I then
> added 3 more nodes to the cluster. Now that the bad RAM has been
> identified and removed, I wanted to add the node back in. I followed
> the instructions that I had used on the previous 3 nodes (i.e. used
> system-config-cluster to configure the node, save and propagate the
> cluster.conf, manually propagate the cluster.conf to the newly added
> node, and then start up cman and clvmd). However when I tried to start
> up cman with "service cman start". The process hangs when actually
> starting up cman. I did some digging and in the /var/log/messages of
> the node I'm attempting to add, I get the following:
>
> Jan 23 15:41:39 node004 ccsd[9342]: Initial status:: Inquorate
> Jan 23 15:41:40 node004 ccsd[9342]: Cluster is not quorate. Refusing
> connection.
> Jan 23 15:41:40 node004 ccsd[9342]: Error while processing connect:
> Connection refused
> Jan 23 15:41:45 node004 ccsd[9342]: Cluster is not quorate. Refusing
> connection.
> Jan 23 15:41:45 node004 ccsd[9342]: Error while processing connect:
> Connection refused
> Jan 23 15:41:50 node004 ccsd[9342]: Cluster is not quorate. Refusing
> connection.
> Jan 23 15:41:50 node004 ccsd[9342]: Error while processing connect:
> Connection refused
>
> I suspect that this is at least part of the problem. However, I'm a
> bit confused because the cluster its attempting to join is most
> definitely quorate. At least according to clustat -f
>
> Cluster Status for rsph_centos_5 @ Fri Jan 23 17:00:45 2009
> Member Status: Quorate
>
> Member Name ID Status
> ------ ---- ---- ------
> head1.clus.sph.emory.edu 1
> Online, Local
> node002.clus.sph.emory.edu 2 Online
> node003.clus.sph.emory.edu 3 Online
> node004.clus.sph.emory.edu 4
> Offline
> node005.clus.sph.emory.edu 5 Online
> node006.clus.sph.emory.edu 6 Online
> node007.clus.sph.emory.edu 7 Online
>
>
> I'm thinking that there is something subtlet that I am missing that I
> can change to make this work. I really don't want to have to
> re-install and reconfigure the machine to get this to work. That is
> something that you do in the Windows world :-)
>
>
> So here is my cluster.conf file. Passwords changed to protect the guilty.
>
> <?xml version="2.0"?>
> <cluster alias="rsph_centos_5" config_version="41" name="rsph_centos_5">
> <fence_daemon clean_start="1" post_fail_delay="30"
> post_join_delay="90"/>
> <clusternodes>
> <clusternode name="head1.clus.sph.emory.edu" nodeid="1"
> votes="7">
> <fence>
> <method name="1">
> <device
> name="sanclusa1.sph.emory.edu" port="1"/>
> <device
> name="sanclusb1.sph.emory.edu" port="1"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="node002.clus.sph.emory.edu"
> nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device
> name="sanclusa1.sph.emory.edu" port="2"/>
> <device
> name="sanclusb1.sph.emory.edu" port="2"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="node003.clus.sph.emory.edu"
> nodeid="3" votes="1">
> <fence>
> <method name="1">
> <device
> name="sanclusa1.sph.emory.edu" port="3"/>
> <device
> name="sanclusb1.sph.emory.edu" port="3"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="node005.clus.sph.emory.edu"
> nodeid="5" votes="1">
> <fence>
> <method name="1">
> <device
> name="sanclusa1.sph.emory.edu" port="5"/>
> <device
> name="sanclusb1.sph.emory.edu" port="5"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="node006.clus.sph.emory.edu"
> nodeid="6" votes="1">
> <fence>
> <method name="1">
> <device
> name="sanclusa1.sph.emory.edu" port="6"/>
> <device
> name="sanclusb1.sph.emory.edu" port="6"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="node007.clus.sph.emory.edu"
> nodeid="7" votes="1">
> <fence>
> <method name="1">
> <device
> name="sanclusa1.sph.emory.edu" port="7"/>
> <device
> name="sanclusb1.sph.emory.edu" port="7"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="node004.clus.sph.emory.edu"
> nodeid="4" votes="1">
> <fence>
> <method name="1">
> <device
> name="sanclusa1.sph.emory.edu" port="4"/>
> <device
> name="sanclusb1.sph.emory.edu" port="4"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman/>
> <fencedevices>
> <fencedevice agent="fence_brocade"
> ipaddr="170.140.183.87" login="admin" name="sanclusa1.sph.emory.edu"
> passwd="mypasshere"/>
> <fencedevice agent="fence_brocade"
> ipaddr="170.140.183.88" login="admin" name="sanclusb1.sph.emory.edu"
> passwd="mypasshere"/>
> </fencedevices>
> <rm>
> <failoverdomains/>
> <resources/>
> </rm>
> </cluster>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
You have a <cman/> to close the cman stanza in cluster.conf, but no
actual <cman parameter1=1 parameter2=2> to open it. Is this correct?
The cman stanza is where you would define expected_votes on the cluster,
so not having this present is perhaps the reason why ccsd believes the
cluster is inquorate?
Regards,
Stewart
More information about the Linux-cluster
mailing list