[Linux-cluster] cannot add 3rd node to running cluster

Fri Jan 22 15:19:45 UTC 2010

On Fri, Jan 22, 2010 at 9:00 AM, King, Adam <adam.king at intechnology.com> wrote:
> I'm assuming you have read this? http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_2to3
>
>
>
>
> Adam King
> Systems Administrator
> adam.king at intechnology.com
>
>
> InTechnology plc
> Support 0845 120 7070
> Telephone 01423 850000
> Facsimile 01423 858866
> www.intechnology.com
>
>
> -----Original Message-----
>
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Terry
> Sent: 22 January 2010 14:45
> To: linux clustering
> Subject: Re: [Linux-cluster] cannot add 3rd node to running cluster
>
> On Mon, Jan 4, 2010 at 1:34 PM, Abraham Alawi <a.alawi at auckland.ac.nz> wrote:
>>
>> On 1/01/2010, at 5:13 AM, Terry wrote:
>>
>>> On Wed, Dec 30, 2009 at 10:13 AM, Terry <td3201 at gmail.com> wrote:
>>>> On Tue, Dec 29, 2009 at 5:20 PM, Jason W. <jwellband at gmail.com> wrote:
>>>>> On Tue, Dec 29, 2009 at 2:30 PM, Terry <td3201 at gmail.com> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I have a working 2 node cluster that I am trying to add a third node
>>>>>> to.   I am trying to use Red Hat's conga (luci) to add the node in but
>>>>>
>>>>> If you have two node cluster with two_node=1 in cluster.conf - such as
>>>>> two nodes with no quorum device to break a tie - you'll need to bring
>>>>> the cluster down, change two_node to 0 on both nodes (and rev the
>>>>> cluster version at the top of cluster.conf), bring the cluster up and
>>>>> then add the third node.
>>>>>
>>>>> For troubleshooting any cluster issue, take a look at syslog
>>>>> (/var/log/messages by default). It can help to watch it on a
>>>>> centralized syslog server that all of your nodes forward logs to.
>>>>>
>>>>> --
>>>>> HTH, YMMV, HANW :)
>>>>>
>>>>> Jason
>>>>>
>>>>> The path to enlightenment is /usr/bin/enlightenment.
>>>>
>>>> Thank you for the response.  /var/log/messages doesn't have any
>>>> errors.  It says cman started then says can't connect to cluster
>>>> infrastructure after a few seconds.  My cluster does not have the
>>>> two_node=1 config now.  Conga took that out for me.  That bit me last
>>>> night because I needed to put that back in.
>>>>
>>>
>>> CMAN still will not start and gives no debug information.  Anyone know
>>> why cman_tool -d join would not print any output at all?
>>> Troubleshooting this is kind of a nightmare.  I verified that two_node
>>> is not in play.
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> Try this line in your cluster.conf file:
>> <logging debug="on" logfile="/var/log/rhcs.log" to_file="yes"/>
>>
>> Also, if you are sure your cluster.conf is correct then copy it manually to all the nodes and add clean_start="1" to the fence_daemon line in cluster.conf and run 'service cman start' simultaneously on all the nodes (probably a good idea to do that from runlevel 1 but make sure you have the network up first)
>>
>> Cheers,
>>
>>  -- Abraham
>>
>> ''''''''''''''''''''''''''''''''''''''''''''''''''''''
>> Abraham Alawi
>>
>> Unix/Linux Systems Administrator
>> Science IT
>> University of Auckland
>> e: a.alawi at auckland.ac.nz
>> p: +64-9-373 7599, ext#: 87572
>>
>> ''''''''''''''''''''''''''''''''''''''''''''''''''''''
>>
>>
>
> I am still battling this.  I stopped the cluster completely, modified
> the config and then started it, but that didn't work either.  Same
> issue.  I noticed clurgmgrd wasn't staying running so I then tried
> this:
>
> [root at omadvnfs01c ~]# clurgmgrd -d -f
> [7014] notice: Waiting for CMAN to start
>
> Then in another window I issued:
> [root at omadvnfs01c ~]# cman_tool join
>
>
> Then back in the other window below "[7014] notice: Waiting for CMAN
> to start", I got:
> failed acquiring lockspace: Transport endpoint is not connected
> Locks not working!
>
> Anyone know what could be going on?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> This is an email from InTechnology plc, Central House, Beckwith Knowle, Harrogate, UK, HG3 1UG.
> Registered in England 3916586.
>
> The contents of this message may be privileged and confidential. If you have received this message in error, you may not use,
>
> disclose, copy or distribute its content in any way. Please notify the sender immediately. All messages are scanned for all viruses.
>
> --

I didn't but I performed those steps anyways.  As it sits, I have a
three node cluster with only two nodes in it.  Which is bad too but it
is what it is until I figure this out.  Here's my cluster.conf just
for completeness:

<cluster alias="omadvnfs01" config_version="53" name="omadvnfs01">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="omadvnfs01a.sec.jel.lc" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="omadvnfs01a-drac"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="omadvnfs01b.sec.jel.lc" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="omadvnfs01b-drac"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="omadvnfs01c.sec.jel.lc" nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="omadvnfs01c-drac"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_drac" ipaddr="10.98.1.211"
login="root" name="omadvnfs01a-drac" passwd="foo"/>
                <fencedevice agent="fence_drac" ipaddr="10.98.1.212"
login="root" name="omadvnfs01b-drac" passwd="foo"/>
                <fencedevice agent="fence_drac" ipaddr="10.98.1.213"
login="root" name="omadvnfs01c-drac" passwd="foo"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="fd_omadvnfs01a-nfs"
nofailback="1" ordered="1" restricted="0">
                                <failoverdomainnode
name="omadvnfs01a.sec.jel.lc" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="fd_omadvnfs01b-nfs"
nofailback="1" ordered="1" restricted="0">
                                <failoverdomainnode
name="omadvnfs01b.sec.jel.lc" priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="fd_omadvnfs01c-nfs"
nofailback="1" ordered="1" restricted="0">
                                <failoverdomainnode
name="omadvnfs01c.sec.jel.lc" priority="1"/>
                        </failoverdomain>
                </failoverdomains>

I am not sure if I did a restart after I did the work though.  When it
says "shutdown cluster software" that is simply a 'service cman stop'
on redhat, right?  Want to make sure I don't need to kill any other
components before updating the configuration manually.  I appreciate
the help.  I am probably going to try it again this afternoon to
double check my work.