[Linux-cluster] cannot add 3rd node to running cluster

Terry td3201 at gmail.com
Fri Jan 22 14:45:29 UTC 2010

On Mon, Jan 4, 2010 at 1:34 PM, Abraham Alawi <a.alawi at auckland.ac.nz> wrote:
> On 1/01/2010, at 5:13 AM, Terry wrote:
>> On Wed, Dec 30, 2009 at 10:13 AM, Terry <td3201 at gmail.com> wrote:
>>> On Tue, Dec 29, 2009 at 5:20 PM, Jason W. <jwellband at gmail.com> wrote:
>>>> On Tue, Dec 29, 2009 at 2:30 PM, Terry <td3201 at gmail.com> wrote:
>>>>> Hello,
>>>>> I have a working 2 node cluster that I am trying to add a third node
>>>>> to.   I am trying to use Red Hat's conga (luci) to add the node in but
>>>> If you have two node cluster with two_node=1 in cluster.conf - such as
>>>> two nodes with no quorum device to break a tie - you'll need to bring
>>>> the cluster down, change two_node to 0 on both nodes (and rev the
>>>> cluster version at the top of cluster.conf), bring the cluster up and
>>>> then add the third node.
>>>> For troubleshooting any cluster issue, take a look at syslog
>>>> (/var/log/messages by default). It can help to watch it on a
>>>> centralized syslog server that all of your nodes forward logs to.
>>> Thank you for the response.  /var/log/messages doesn't have any
>>> errors.  It says cman started then says can't connect to cluster
>>> infrastructure after a few seconds.  My cluster does not have the
>>> two_node=1 config now.  Conga took that out for me.  That bit me last
>>> night because I needed to put that back in.
>> CMAN still will not start and gives no debug information.  Anyone know
>> why cman_tool -d join would not print any output at all?
>> Troubleshooting this is kind of a nightmare.  I verified that two_node
>> is not in play.
> Try this line in your cluster.conf file:
> <logging debug="on" logfile="/var/log/rhcs.log" to_file="yes"/>
> Also, if you are sure your cluster.conf is correct then copy it manually to all the nodes and add clean_start="1" to the fence_daemon line in cluster.conf and run 'service cman start' simultaneously on all the nodes (probably a good idea to do that from runlevel 1 but make sure you have the network up first)
I am still battling this.  I stopped the cluster completely, modified
the config and then started it, but that didn't work either.  Same
issue.  I noticed clurgmgrd wasn't staying running so I then tried

[root at omadvnfs01c ~]# clurgmgrd -d -f
[7014] notice: Waiting for CMAN to start

Then in another window I issued:
[root at omadvnfs01c ~]# cman_tool join

Then back in the other window below "[7014] notice: Waiting for CMAN
to start", I got:
failed acquiring lockspace: Transport endpoint is not connected
Locks not working!

Anyone know what could be going on?

