[Linux-cluster] 3 node cluster problems

Dalton, Maurice bobby.m.dalton at nasa.gov
Tue Mar 25 14:23:10 UTC 2008


Same problem.

I now have qdiskd running.

 

I have ran diff's on all three cluster.conf files.. all are the same

 

 

[root at csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="6" name="csarcsys5">

        <fence_daemon post_fail_delay="0" post_join_delay="3"/>

        <clusternodes>

                <clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

                        <fence/>

                </clusternode>

                <clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

                        <fence/>

                </clusternode>

                <clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

                        <fence/>

                </clusternode>

        </clusternodes>

        <cman/>

        <fencedevices/>

        <rm>

                <failoverdomains>

                        <failoverdomain name="csarcsysfo" ordered="0"
restricted="1">

                                <failoverdomainnode
name="csarcsys1-eth0" priority="1"/>

                                <failoverdomainnode
name="csarcsys2-eth0" priority="1"/>

                                <failoverdomainnode
name="csarcsys3-eth0" priority="1"/>

                        </failoverdomain>

                </failoverdomains>

                <resources>

                        <ip address="172.24.86.177" monitor_link="1"/>

                        <fs device="/dev/sdc1" force_fsck="0"
force_unmount="1" fsid="57739" fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

                </resources>

        </rm>

        <quorumd interval="4" label="csarcsysQ" min_score="1" tko="30"
votes="2"/>

</cluster>

 

 

More info from csarcsys3

 

[root at csarcsys3-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

 

  Member Name                        ID   Status

  ------ ----                        ---- ------

  csarcsys1-eth0                        1 Offline

  csarcsys2-eth0                        2 Offline

  csarcsys3-eth0                        3 Online, Local

  /dev/sdd1                             0 Offline

 

[root at csarcsys3-eth0 cluster]# mkqdisk -L

mkqdisk v0.5.1

/dev/sdd1:

        Magic:   eb7a62c2

        Label:   csarcsysQ

        Created: Wed Feb 13 13:44:35 2008

        Host:    csarcsys1-eth0.xxx.xxx.nasa.gov

 

 

 

[root at csarcsys3-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

 

 

 

clustat from csarcsys1

msg_open: No such file or directory

Member Status: Quorate

 

  Member Name                        ID   Status

  ------ ----                        ---- ------

  csarcsys1-eth0                        1 Online, Local

  csarcsys2-eth0                        2 Online

  csarcsys3-eth0                        3 Offline

  /dev/sdd1                             0 Offline, Quorum Disk

 

 

 

[root at csarcsys1-eth0 cluster]# ls -l /dev/sdd1

brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

 

 

mkqdisk v0.5.1

/dev/sdd1:

        Magic:   eb7a62c2

        Label:   csarcsysQ

        Created: Wed Feb 13 13:44:35 2008

        Host:    csarcsys1-eth0.xxx.xxx.nasa.gov

 

 

 

 

Info from csarcsys2

 

root at csarcsys2-eth0 cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

 

  Member Name                        ID   Status

  ------ ----                        ---- ------

  csarcsys1-eth0                        1 Offline

  csarcsys2-eth0                        2 Online, Local

  csarcsys3-eth0                        3 Offline

  /dev/sdd1                             0 Online, Quorum Disk

 

 

 

 

 

 

 

 

 

 

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Panigrahi,
Santosh Kumar
Sent: Tuesday, March 25, 2008 7:33 AM
To: linux clustering
Subject: RE: [Linux-cluster] 3 node cluster problems

 

If you are configuring your cluster by system-config-cluster then no
need to run ricci/luci. Ricci/luci needed for configuring the cluster
using conga. You can configure in either ways.

On seeing your clustat command outputs, it seems cluster is partitioned
(spilt brain) into 2 sub clusters [Sub1- (csarcsys1-eth0,
csarcsys2-eth0) 2- csarcsys3-eth0]. Without a quorum device you can more
often face this situation. To avoid this you can configure a quorum
device with a heuristic like ping message. Use the link
(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-
qdisk/
<http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-
qdisk/> ) for configuring a quorum disk in RHCS.

Thanks,

S

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Dalton, Maurice
Sent: Tuesday, March 25, 2008 5:18 PM
To: linux clustering
Subject: RE: [Linux-cluster] 3 node cluster problems

Still no change. Same as below. 

I completely rebuilt the cluster using system-config-cluster

The Cluster software was installed from rhn, luci and ricci are running.

This is the new config file and it has been copied to the 2 other

systems

 

[root at csarcsys1-eth0 cluster]# more cluster.conf

<?xml version="1.0"?>

<cluster config_version="5" name="csarcsys5">

        <fence_daemon post_fail_delay="0" post_join_delay="3"/>

        <clusternodes>

                <clusternode name="csarcsys1-eth0" nodeid="1" votes="1">

                        <fence/>

                </clusternode>

                <clusternode name="csarcsys2-eth0" nodeid="2" votes="1">

                        <fence/>

                </clusternode>

                <clusternode name="csarcsys3-eth0" nodeid="3" votes="1">

                        <fence/>

                </clusternode>

        </clusternodes>

        <cman/>

        <fencedevices/>

        <rm>

                <failoverdomains>

                        <failoverdomain name="csarcsysfo" ordered="0"

restricted="1">

                                <failoverdomainnode

name="csarcsys1-eth0" priority="1"/>

                                <failoverdomainnode

name="csarcsys2-eth0" priority="1"/>

                                <failoverdomainnode

name="csarcsys3-eth0" priority="1"/>

                        </failoverdomain>

                </failoverdomains>

                <resources>

                        <ip address="172.xx.xx.xxx" monitor_link="1"/>

                        <fs device="/dev/sdc1" force_fsck="0"

force_unmount="1" fsid="57739" fstype="ext3" mountpo

int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

                </resources>

        </rm>

</cluster>

-----Original Message-----

From: linux-cluster-bounces at redhat.com

[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas

Sent: Monday, March 24, 2008 4:17 PM

To: linux clustering

Subject: Re: [Linux-cluster] 3 node cluster problems

Did you load the Cluster software via Conga or manually ? You would have

had to load

luci on one node and ricci on all three.

Try copying the modified /etc/cluster/cluster.conf from csarcsys1 to the

other two nodes.

Make sure you can ping the private interface to/from all nodes and 

reboot. If this does not work

post your /etc/cluster/cluster.conf file again.

 

Dalton, Maurice wrote:

> Yes

> I also rebooted again just now to be sure.

> 

> 

> -----Original Message-----

> From: linux-cluster-bounces at redhat.com

> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas

> Sent: Monday, March 24, 2008 3:33 PM

> To: linux clustering

> Subject: Re: [Linux-cluster] 3 node cluster problems

> 

> When you changed the nodenames in the /etc/lcuster/cluster.conf and

made

> 

> sure the /etc/hosts

> file had the correct nodenames (Ie. 10.0.0.100  csarcsys1-eth0   

> csarcsys1-eth0.xxxx.xxxx.xxx.)

> Did you reboot all the nodes at the sametime ?

> 

> Dalton, Maurice wrote:

>   

>> No luck. It seems as if csarcsys3 thinks its in his own cluster

>> I renamed all config files and rebuilt from system-config-cluster

>> 

>> Clustat command from csarcsys3

>> 

>> 

>> [root at csarcsys3-eth0 cluster]# clustat

>> msg_open: No such file or directory

>> Member Status: Inquorate

>> 

>>   Member Name                        ID   Status

>>   ------ ----                        ---- ------

>>   csarcsys1-eth0                        1 Offline

>>   csarcsys2-eth0                        2 Offline

>>   csarcsys3-eth0                        3 Online, Local

>> 

>> clustat command from csarcsys2 

>> 

>> [root at csarcsys2-eth0 cluster]# clustat

>> msg_open: No such file or directory

>> Member Status: Quorate

>> 

>>   Member Name                        ID   Status

>>   ------ ----                        ---- ------

>>   csarcsys1-eth0                        1 Online

>>   csarcsys2-eth0                        2 Online, Local

>>   csarcsys3-eth0                        3 Offline

>> 

>> 

>> -----Original Message-----

>> From: linux-cluster-bounces at redhat.com

>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas

>> Sent: Monday, March 24, 2008 2:25 PM

>> To: linux clustering

>> Subject: Re: [Linux-cluster] 3 node cluster problems

>> 

>> You will also, need to make sure the clustered nodenames are in your 

>> /etc/hosts file.

>> Also, make sure your cluster network interface is up on all nodes and

>> that the

>> /etc/cluster/cluster.conf are the same on all nodes.

>> 

>> 

>> 

>> Dalton, Maurice wrote:

>>   

>>     

>>> The last post is incorrect.

>>> 

>>> Fence is still hanging at start up.

>>> 

>>> Here's another log message.

>>> 

>>> Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing 

>>> connect: Connection refused

>>> 

>>> Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs 

>>> error -111, check ccsd or cluster status

>>> 

>>> *From:* linux-cluster-bounces at redhat.com 

>>> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Bennie

>>>       

> Thomas

>   

>>> *Sent:* Monday, March 24, 2008 11:22 AM

>>> *To:* linux clustering

>>> *Subject:* Re: [Linux-cluster] 3 node cluster problems

>>> 

>>> try removing the fully qualified hostname from the cluster.conf

file.

>>> 

>>> 

>>> Dalton, Maurice wrote:

>>> 

>>> I have NO fencing equipment

>>> 

>>> I have been task to setup a 3 node cluster

>>> 

>>> Currently I have having problems getting cman(fence) to start

>>> 

>>> Fence will try to start up during cman start up but will fail

>>> 

>>> I tried to run /sbin/fenced -D - I get the following

>>> 

>>> 1206373475 cman_init error 0 111

>>> 

>>> Here's my cluster.conf file

>>> 

>>> <?xml version="1.0"?>

>>> 

>>> <cluster alias="csarcsys51" config_version="26" name="csarcsys51">

>>> 

>>> <fence_daemon clean_start="0" post_fail_delay="0"

>>>     

>>>       

>> post_join_delay="3"/>

>>   

>>     

>>> <clusternodes>

>>> 

>>> <clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"

>>>     

>>>       

>> votes="1">

>>   

>>     

>>> <fence/>

>>> 

>>> </clusternode>

>>> 

>>> <clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"

>>>     

>>>       

>> votes="1">

>>   

>>     

>>> <fence/>

>>> 

>>> </clusternode>

>>> 

>>> <clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"

>>>     

>>>       

>> votes="1">

>>   

>>     

>>> <fence/>

>>> 

>>> </clusternode>

>>> 

>>> </clusternodes>

>>> 

>>> <cman/>

>>> 

>>> <fencedevices/>

>>> 

>>> <rm>

>>> 

>>> <failoverdomains>

>>> 

>>> <failoverdomain name="csarcsys-fo" ordered="1" restricted="0">

>>> 

>>> <failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"

>>>     

>>>       

>> priority="1"/>

>>   

>>     

>>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"

>>>     

>>>       

>> priority="1"/>

>>   

>>     

>>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"

>>>     

>>>       

>> priority="1"/>

>>   

>>     

>>> </failoverdomain>

>>> 

>>> </failoverdomains>

>>> 

>>> <resources>

>>> 

>>> <ip address="xxx.xxx.xxx.xxx" monitor_link="1"/>

>>> 

>>> <fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739"

>>> fstype="ext3" mountpo

>>> 

>>> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>

>>> 

>>> <nfsexport name="csarcsys-export"/>

>>> 

>>> <nfsclient name="csarcsys-nfs-client" options="no_root_squash,rw" 

>>> path="/csarc-test" targe

>>> 

>>> t="xxx.xxx.xxx.*"/>

>>> 

>>> </resources>

>>> 

>>> </rm>

>>> 

>>> </cluster>

>>> 

>>> Messages from the logs

>>> 

>>> ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>> Refusing connection.

>>> 

>>> Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>> connect: Connection refused

>>> 

>>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>> Refusing connection.

>>> 

>>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>> connect: Connection refused

>>> 

>>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>> Refusing connection.

>>> 

>>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>> connect: Connection refused

>>> 

>>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>> Refusing connection.

>>> 

>>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>> connect: Connection refused

>>> 

>>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 

>>> Refusing connection.

>>> 

>>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing 

>>> connect: Connection refused

>>> 

>>>  

>>> 

>>>     

>>>       

> 

------------------------------------------------------------------------

>   

>>   

>>     

>>>   

>>>  

>>> --

>>> Linux-cluster mailing list

>>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>

>>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>> 

>>> 

>>>     

>>>       

> 

------------------------------------------------------------------------

>   

>>   

>>     

>>> --

>>> Linux-cluster mailing list

>>> Linux-cluster at redhat.com

>>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>>     

>>>       

>> 

>> --

>> Linux-cluster mailing list

>> Linux-cluster at redhat.com

>> https://www.redhat.com/mailman/listinfo/linux-cluster

>> 

>> --

>> Linux-cluster mailing list

>> Linux-cluster at redhat.com

>> https://www.redhat.com/mailman/listinfo/linux-cluster

>>   

>>     

> 

> 

> --

> Linux-cluster mailing list

> Linux-cluster at redhat.com

> https://www.redhat.com/mailman/listinfo/linux-cluster

> 

> --

> Linux-cluster mailing list

> Linux-cluster at redhat.com

> https://www.redhat.com/mailman/listinfo/linux-cluster

>   

 

--

Linux-cluster mailing list

Linux-cluster at redhat.com

https://www.redhat.com/mailman/listinfo/linux-cluster

--

Linux-cluster mailing list

Linux-cluster at redhat.com

https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080325/3a6453de/attachment.htm>


More information about the Linux-cluster mailing list