[Linux-cluster] cannot add 3rd node to running cluster

Fri Jan 22 15:44:10 UTC 2010

This is the way I would do it:
Have it as a 2 node cluster in your cluster.conf
/etc/init.d/cman stop
/etc/init.d/modclusterd stop (not sure if this is necessary or not)
On both nodes.
Edit your cluster.conf on the first node to include the third node
Delete the two_node=1 line 
Scp the cluster.conf file from your first node to the second and third to cut out the risk of any editing mistakes.
Make sure chkconfig has the relevant daemons starting at boot time on all 3 nodes
Reboot all 3 servers using /sbin/reboot -f

Adam King
Systems Administrator
adam.king at intechnology.com

InTechnology plc
Support 0845 120 7070
Telephone 01423 850000
Facsimile 01423 858866
www.intechnology.com

-----Original Message-----

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Terry
Sent: 22 January 2010 15:20
To: linux clustering
Subject: Re: [Linux-cluster] cannot add 3rd node to running cluster

On Fri, Jan 22, 2010 at 9:00 AM, King, Adam <adam.king at intechnology.com> wrote:
> I'm assuming you have read this? http://sources.redhat.com/cluster/wiki/FAQ/CMAN#cman_2to3
>
>
>
>
> Adam King
> Systems Administrator
> adam.king at intechnology.com
>
>
> InTechnology plc
> Support 0845 120 7070
> Telephone 01423 850000
> Facsimile 01423 858866
> www.intechnology.com
>
>
> -----Original Message-----
>
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Terry
> Sent: 22 January 2010 14:45
> To: linux clustering
> Subject: Re: [Linux-cluster] cannot add 3rd node to running cluster
>
> On Mon, Jan 4, 2010 at 1:34 PM, Abraham Alawi <a.alawi at auckland.ac.nz> wrote:
>>
>> On 1/01/2010, at 5:13 AM, Terry wrote:
>>
>>> On Wed, Dec 30, 2009 at 10:13 AM, Terry <td3201 at gmail.com> wrote:
>>>> On Tue, Dec 29, 2009 at 5:20 PM, Jason W. <jwellband at gmail.com> wrote:
>>>>> On Tue, Dec 29, 2009 at 2:30 PM, Terry <td3201 at gmail.com> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I have a working 2 node cluster that I am trying to add a third node
>>>>>> to.   I am trying to use Red Hat's conga (luci) to add the node in but
>>>>>
>>>>> If you have two node cluster with two_node=1 in cluster.conf - such as
>>>>> two nodes with no quorum device to break a tie - you'll need to bring
>>>>> the cluster down, change two_node to 0 on both nodes (and rev the
>>>>> cluster version at the top of cluster.conf), bring the cluster up and
>>>>> then add the third node.
>>>>>
>>>>> For troubleshooting any cluster issue, take a look at syslog
>>>>> (/var/log/messages by default). It can help to watch it on a
>>>>> centralized syslog server that all of your nodes forward logs to.
>>>>>
>>>>> --
>>>>> HTH, YMMV, HANW :)
>>>>>
>>>>> Jason
>>>>>
>>>>> The path to enlightenment is /usr/bin/enlightenment.
>>>>
>>>> Thank you for the response.  /var/log/messages doesn't have any
>>>> errors.  It says cman started then says can't connect to cluster
>>>> infrastructure after a few seconds.  My cluster does not have the
>>>> two_node=1 config now.  Conga took that out for me.  That bit me last
>>>> night because I needed to put that back in.
>>>>
>>>
>>> CMAN still will not start and gives no debug information.  Anyone know
>>> why cman_tool -d join would not print any output at all?
>>> Troubleshooting this is kind of a nightmare.  I verified that two_node
>>> is not in play.
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>> Try this line in your cluster.conf file:
>> <logging debug="on" logfile="/var/log/rhcs.log" to_file="yes"/>
>>
>> Also, if you are sure your cluster.conf is correct then copy it manually to all the nodes and add clean_start="1" to the fence_daemon line in cluster.conf and run 'service cman start' simultaneously on all the nodes (probably a good idea to do that from runlevel 1 but make sure you have the network up first)
>>
>> Cheers,
>>
>>  -- Abraham
>>
>> ''''''''''''''''''''''''''''''''''''''''''''''''''''''
>> Abraham Alawi
>>
>> Unix/Linux Systems Administrator
>> Science IT
>> University of Auckland
>> e: a.alawi at auckland.ac.nz
>> p: +64-9-373 7599, ext#: 87572
>>
>> ''''''''''''''''''''''''''''''''''''''''''''''''''''''
>>
>>
>
> I am still battling this.  I stopped the cluster completely, modified
> the config and then started it, but that didn't work either.  Same
> issue.  I noticed clurgmgrd wasn't staying running so I then tried
> this:
>
> [root at omadvnfs01c ~]# clurgmgrd -d -f
> [7014] notice: Waiting for CMAN to start
>
> Then in another window I issued:
> [root at omadvnfs01c ~]# cman_tool join
>
>
> Then back in the other window below "[7014] notice: Waiting for CMAN
> to start", I got:
> failed acquiring lockspace: Transport endpoint is not connected
> Locks not working!
>
> Anyone know what could be going on?
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> This is an email from InTechnology plc, Central House, Beckwith Knowle, Harrogate, UK, HG3 1UG.
> Registered in England 3916586.
>
> The contents of this message may be privileged and confidential. If you have received this message in error, you may not use,
>
> disclose, copy or distribute its content in any way. Please notify the sender immediately. All messages are scanned for all viruses.
>
> --

I didn't but I performed those steps anyways.  As it sits, I have a
three node cluster with only two nodes in it.  Which is bad too but it
is what it is until I figure this out.  Here's my cluster.conf just
for completeness:

<cluster alias="omadvnfs01" config_version="53" name="omadvnfs01">
        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="omadvnfs01a.sec.jel.lc" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="omadvnfs01a-drac"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="omadvnfs01b.sec.jel.lc" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="omadvnfs01b-drac"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="omadvnfs01c.sec.jel.lc" nodeid="3" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="omadvnfs01c-drac"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                <fencedevice agent="fence_drac" ipaddr="10.98.1.211"
login="root" name="omadvnfs01a-drac" passwd="foo"/>
                <fencedevice agent="fence_drac" ipaddr="10.98.1.212"
login="root" name="omadvnfs01b-drac" passwd="foo"/>
                <fencedevice agent="fence_drac" ipaddr="10.98.1.213"
login="root" name="omadvnfs01c-drac" passwd="foo"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="fd_omadvnfs01a-nfs"
nofailback="1" ordered="1" restricted="0">
                                <failoverdomainnode
name="omadvnfs01a.sec.jel.lc" priority="1"/>
                        </failoverdomain>
                        <failoverdomain name="fd_omadvnfs01b-nfs"
nofailback="1" ordered="1" restricted="0">
                                <failoverdomainnode
name="omadvnfs01b.sec.jel.lc" priority="2"/>
                        </failoverdomain>
                        <failoverdomain name="fd_omadvnfs01c-nfs"
nofailback="1" ordered="1" restricted="0">
                                <failoverdomainnode
name="omadvnfs01c.sec.jel.lc" priority="1"/>
                        </failoverdomain>
                </failoverdomains>

I am not sure if I did a restart after I did the work though.  When it
says "shutdown cluster software" that is simply a 'service cman stop'
on redhat, right?  Want to make sure I don't need to kill any other
components before updating the configuration manually.  I appreciate
the help.  I am probably going to try it again this afternoon to
double check my work.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
This is an email from InTechnology plc, Central House, Beckwith Knowle, Harrogate, UK, HG3 1UG.
Registered in England 3916586.

The contents of this message may be privileged and confidential. If you have received this message in error, you may not use,

disclose, copy or distribute its content in any way. Please notify the sender immediately. All messages are scanned for all viruses.