[Linux-cluster] 3 node cluster problems

Dalton, Maurice bobby.m.dalton at nasa.gov
Tue Mar 25 16:07:20 UTC 2008


I have just began to start test gfs. 

Sadly I rebooted csarcsys1 and now I am back in my same situation

This is weird.


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:57 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems


Glad they are working. I have not used lvm with our Clusters. You know 
have peaked
my curiosity and I will have to try building one. So were you also using

GFS ?

Dalton, Maurice wrote:
> Sorry but security here will not allow me to send host files
>
> BUT.
>
>
> I was getting this in /var/log/messages on csarcsys3
>
> Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
> Refusing connection.
> Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
> connect: Connection refused
> Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs
error
> -111, check ccsd or cluster status
> Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
> Refusing connection.
> Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
> connect: Connection refused
>
>
> I had /dev/vg0/gfsvol on these systems.
>
> I did a lvremove 
>
> Restarted cman on all systems and for some strange reason my clusters
> are working.
>
> It doesn't make any sense.
>
> I can't thank you enough for your help.......!!!!!!
>
>
> Thanks.
>
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
> Sent: Tuesday, March 25, 2008 10:27 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] 3 node cluster problems
>
> I am currently running several 3-node cluster without a quorum disk. 
> However, If you want your cluster to run
> if only one node is up then you will need a quorum disk. Can you send 
> your /etc/hosts file
> for all systems, Also, could there be another node name called 
> csarcsys3-eth0 in your NIS or DNS
>
> I configured some using Conga and some with system-config-cluster.
When 
> using the system-config-cluster
> I basically run the config on all nodes; just adding the nodenames and

> cluster name. I reboot all nodes
> to make sure they see each other then go back and modify the config
> files.
>
> The file /var/log/messages should also shed some light on the problem.
>
> Dalton, Maurice wrote:
>   
>> Same problem.
>>
>> I now have qdiskd running.
>>
>> I have ran diff's on all three cluster.conf files.. all are the same
>>
>> [root at csarcsys1-eth0 cluster]# more cluster.conf
>>
>> <?xml version="1.0"?>
>>
>> <cluster config_version="6" name="csarcsys5">
>>
>> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>>
>> <clusternodes>
>>
>> <clusternode name="csarcsys1-eth0" nodeid="1" votes="1">
>>
>> <fence/>
>>
>> </clusternode>
>>
>> <clusternode name="csarcsys2-eth0" nodeid="2" votes="1">
>>
>> <fence/>
>>
>> </clusternode>
>>
>> <clusternode name="csarcsys3-eth0" nodeid="3" votes="1">
>>
>> <fence/>
>>
>> </clusternode>
>>
>> </clusternodes>
>>
>> <cman/>
>>
>> <fencedevices/>
>>
>> <rm>
>>
>> <failoverdomains>
>>
>> <failoverdomain name="csarcsysfo" ordered="0" restricted="1">
>>
>> <failoverdomainnode name="csarcsys1-eth0" priority="1"/>
>>
>> <failoverdomainnode name="csarcsys2-eth0" priority="1"/>
>>
>> <failoverdomainnode name="csarcsys3-eth0" priority="1"/>
>>
>> </failoverdomain>
>>
>> </failoverdomains>
>>
>> <resources>
>>
>> <ip address="172.24.86.177" monitor_link="1"/>
>>
>> <fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739" 
>> fstype="ext3" mountpo
>>
>> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
>>
>> </resources>
>>
>> </rm>
>>
>> <quorumd interval="4" label="csarcsysQ" min_score="1" tko="30"
>>     
> votes="2"/>
>   
>> </cluster>
>>
>> More info from csarcsys3
>>
>> [root at csarcsys3-eth0 cluster]# clustat
>>
>> msg_open: No such file or directory
>>
>> Member Status: Inquorate
>>
>> Member Name ID Status
>>
>> ------ ---- ---- ------
>>
>> csarcsys1-eth0 1 Offline
>>
>> csarcsys2-eth0 2 Offline
>>
>> csarcsys3-eth0 3 Online, Local
>>
>> /dev/sdd1 0 Offline
>>
>> [root at csarcsys3-eth0 cluster]# mkqdisk -L
>>
>> mkqdisk v0.5.1
>>
>> /dev/sdd1:
>>
>> Magic: eb7a62c2
>>
>> Label: csarcsysQ
>>
>> Created: Wed Feb 13 13:44:35 2008
>>
>> Host: csarcsys1-eth0.xxx.xxx.nasa.gov
>>
>> [root at csarcsys3-eth0 cluster]# ls -l /dev/sdd1
>>
>> brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1
>>
>> clustat from csarcsys1
>>
>> msg_open: No such file or directory
>>
>> Member Status: Quorate
>>
>> Member Name ID Status
>>
>> ------ ---- ---- ------
>>
>> csarcsys1-eth0 1 Online, Local
>>
>> csarcsys2-eth0 2 Online
>>
>> csarcsys3-eth0 3 Offline
>>
>> /dev/sdd1 0 Offline, Quorum Disk
>>
>> [root at csarcsys1-eth0 cluster]# ls -l /dev/sdd1
>>
>> brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1
>>
>> mkqdisk v0.5.1
>>
>> /dev/sdd1:
>>
>> Magic: eb7a62c2
>>
>> Label: csarcsysQ
>>
>> Created: Wed Feb 13 13:44:35 2008
>>
>> Host: csarcsys1-eth0.xxx.xxx.nasa.gov
>>
>> Info from csarcsys2
>>
>> root at csarcsys2-eth0 cluster]# clustat
>>
>> msg_open: No such file or directory
>>
>> Member Status: Quorate
>>
>> Member Name ID Status
>>
>> ------ ---- ---- ------
>>
>> csarcsys1-eth0 1 Offline
>>
>> csarcsys2-eth0 2 Online, Local
>>
>> csarcsys3-eth0 3 Offline
>>
>> /dev/sdd1 0 Online, Quorum Disk
>>
>> *From:* linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Panigrahi, 
>> Santosh Kumar
>> *Sent:* Tuesday, March 25, 2008 7:33 AM
>> *To:* linux clustering
>> *Subject:* RE: [Linux-cluster] 3 node cluster problems
>>
>> If you are configuring your cluster by system-config-cluster then no 
>> need to run ricci/luci. Ricci/luci needed for configuring the cluster

>> using conga. You can configure in either ways.
>>
>> On seeing your clustat command outputs, it seems cluster is 
>> partitioned (spilt brain) into 2 sub clusters [Sub1-* 
>> **(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without a

>> quorum device you can more often face this situation. To avoid this 
>> you can configure a quorum device with a heuristic like ping message.

>> Use the link 
>>
>>     
>
(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-
> qdisk/) 
>   
>> for configuring a quorum disk in RHCS.
>>
>> Thanks,
>>
>> S
>>
>> -----Original Message-----
>> From: linux-cluster-bounces at redhat.com 
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Dalton,
Maurice
>> Sent: Tuesday, March 25, 2008 5:18 PM
>> To: linux clustering
>> Subject: RE: [Linux-cluster] 3 node cluster problems
>>
>> Still no change. Same as below.
>>
>> I completely rebuilt the cluster using system-config-cluster
>>
>> The Cluster software was installed from rhn, luci and ricci are
>>     
> running.
>   
>> This is the new config file and it has been copied to the 2 other
>>
>> systems
>>
>> [root at csarcsys1-eth0 cluster]# more cluster.conf
>>
>> <?xml version="1.0"?>
>>
>> <cluster config_version="5" name="csarcsys5">
>>
>> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>>
>> <clusternodes>
>>
>> <clusternode name="csarcsys1-eth0" nodeid="1" votes="1">
>>
>> <fence/>
>>
>> </clusternode>
>>
>> <clusternode name="csarcsys2-eth0" nodeid="2" votes="1">
>>
>> <fence/>
>>
>> </clusternode>
>>
>> <clusternode name="csarcsys3-eth0" nodeid="3" votes="1">
>>
>> <fence/>
>>
>> </clusternode>
>>
>> </clusternodes>
>>
>> <cman/>
>>
>> <fencedevices/>
>>
>> <rm>
>>
>> <failoverdomains>
>>
>> <failoverdomain name="csarcsysfo" ordered="0"
>>
>> restricted="1">
>>
>> <failoverdomainnode
>>
>> name="csarcsys1-eth0" priority="1"/>
>>
>> <failoverdomainnode
>>
>> name="csarcsys2-eth0" priority="1"/>
>>
>> <failoverdomainnode
>>
>> name="csarcsys3-eth0" priority="1"/>
>>
>> </failoverdomain>
>>
>> </failoverdomains>
>>
>> <resources>
>>
>> <ip address="172.xx.xx.xxx" monitor_link="1"/>
>>
>> <fs device="/dev/sdc1" force_fsck="0"
>>
>> force_unmount="1" fsid="57739" fstype="ext3" mountpo
>>
>> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
>>
>> </resources>
>>
>> </rm>
>>
>> </cluster>
>>
>> -----Original Message-----
>>
>> From: linux-cluster-bounces at redhat.com
>>
>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
>>
>> Sent: Monday, March 24, 2008 4:17 PM
>>
>> To: linux clustering
>>
>> Subject: Re: [Linux-cluster] 3 node cluster problems
>>
>> Did you load the Cluster software via Conga or manually ? You would
>>     
> have
>   
>> had to load
>>
>> luci on one node and ricci on all three.
>>
>> Try copying the modified /etc/cluster/cluster.conf from csarcsys1 to
>>     
> the
>   
>> other two nodes.
>>
>> Make sure you can ping the private interface to/from all nodes and
>>
>> reboot. If this does not work
>>
>> post your /etc/cluster/cluster.conf file again.
>>
>> Dalton, Maurice wrote:
>>
>>     
>>> Yes
>>>       
>>> I also rebooted again just now to be sure.
>>>       
>>> -----Original Message-----
>>>       
>>> From: linux-cluster-bounces at redhat.com
>>>       
>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
>>>       
>>> Sent: Monday, March 24, 2008 3:33 PM
>>>       
>>> To: linux clustering
>>>       
>>> Subject: Re: [Linux-cluster] 3 node cluster problems
>>>       
>>> When you changed the nodenames in the /etc/lcuster/cluster.conf and
>>>       
>> made
>>
>>     
>>> sure the /etc/hosts
>>>       
>>> file had the correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0
>>>       
>>> csarcsys1-eth0.xxxx.xxxx.xxx.)
>>>       
>>> Did you reboot all the nodes at the sametime ?
>>>       
>>> Dalton, Maurice wrote:
>>>       
>>>> No luck. It seems as if csarcsys3 thinks its in his own cluster
>>>>         
>>>> I renamed all config files and rebuilt from system-config-cluster
>>>>         
>>>> Clustat command from csarcsys3
>>>>         
>>>> [root at csarcsys3-eth0 cluster]# clustat
>>>>         
>>>> msg_open: No such file or directory
>>>>         
>>>> Member Status: Inquorate
>>>>         
>>>> Member Name ID Status
>>>>         
>>>> ------ ---- ---- ------
>>>>         
>>>> csarcsys1-eth0 1 Offline
>>>>         
>>>> csarcsys2-eth0 2 Offline
>>>>         
>>>> csarcsys3-eth0 3 Online, Local
>>>>         
>>>> clustat command from csarcsys2
>>>>         
>>>> [root at csarcsys2-eth0 cluster]# clustat
>>>>         
>>>> msg_open: No such file or directory
>>>>         
>>>> Member Status: Quorate
>>>>         
>>>> Member Name ID Status
>>>>         
>>>> ------ ---- ---- ------
>>>>         
>>>> csarcsys1-eth0 1 Online
>>>>         
>>>> csarcsys2-eth0 2 Online, Local
>>>>         
>>>> csarcsys3-eth0 3 Offline
>>>>         
>>>> -----Original Message-----
>>>>         
>>>> From: linux-cluster-bounces at redhat.com
>>>>         
>>>> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie
>>>>         
> Thomas
>   
>>>> Sent: Monday, March 24, 2008 2:25 PM
>>>>         
>>>> To: linux clustering
>>>>         
>>>> Subject: Re: [Linux-cluster] 3 node cluster problems
>>>>         
>>>> You will also, need to make sure the clustered nodenames are in
>>>>         
> your
>   
>>>> /etc/hosts file.
>>>>         
>>>> Also, make sure your cluster network interface is up on all nodes
>>>>         
> and
>   
>>>> that the
>>>>         
>>>> /etc/cluster/cluster.conf are the same on all nodes.
>>>>         
>>>> Dalton, Maurice wrote:
>>>>         
>>>>> The last post is incorrect.
>>>>>           
>>>>> Fence is still hanging at start up.
>>>>>           
>>>>> Here's another log message.
>>>>>           
>>>>> Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing
>>>>>           
>>>>> connect: Connection refused
>>>>>           
>>>>> Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs
>>>>>           
>>>>> error -111, check ccsd or cluster status
>>>>>           
>>>>> *From:* linux-cluster-bounces at redhat.com
>>>>>           
>>>>> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Bennie
>>>>>           
>>> Thomas
>>>       
>>>>> *Sent:* Monday, March 24, 2008 11:22 AM
>>>>>           
>>>>> *To:* linux clustering
>>>>>           
>>>>> *Subject:* Re: [Linux-cluster] 3 node cluster problems
>>>>>           
>>>>> try removing the fully qualified hostname from the cluster.conf
>>>>>           
>> file.
>>
>>     
>>>>> Dalton, Maurice wrote:
>>>>>           
>>>>> I have NO fencing equipment
>>>>>           
>>>>> I have been task to setup a 3 node cluster
>>>>>           
>>>>> Currently I have having problems getting cman(fence) to start
>>>>>           
>>>>> Fence will try to start up during cman start up but will fail
>>>>>           
>>>>> I tried to run /sbin/fenced -D - I get the following
>>>>>           
>>>>> 1206373475 cman_init error 0 111
>>>>>           
>>>>> Here's my cluster.conf file
>>>>>           
>>>>> <?xml version="1.0"?>
>>>>>           
>>>>> <cluster alias="csarcsys51" config_version="26" name="csarcsys51">
>>>>>           
>>>>> <fence_daemon clean_start="0" post_fail_delay="0"
>>>>>           
>>>> post_join_delay="3"/>
>>>>         
>>>>> <clusternodes>
>>>>>           
>>>>> <clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"
>>>>>           
>>>> votes="1">
>>>>         
>>>>> <fence/>
>>>>>           
>>>>> </clusternode>
>>>>>           
>>>>> <clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"
>>>>>           
>>>> votes="1">
>>>>         
>>>>> <fence/>
>>>>>           
>>>>> </clusternode>
>>>>>           
>>>>> <clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"
>>>>>           
>>>> votes="1">
>>>>         
>>>>> <fence/>
>>>>>           
>>>>> </clusternode>
>>>>>           
>>>>> </clusternodes>
>>>>>           
>>>>> <cman/>
>>>>>           
>>>>> <fencedevices/>
>>>>>           
>>>>> <rm>
>>>>>           
>>>>> <failoverdomains>
>>>>>           
>>>>> <failoverdomain name="csarcsys-fo" ordered="1" restricted="0">
>>>>>           
>>>>> <failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"
>>>>>           
>>>> priority="1"/>
>>>>         
>>>>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
>>>>>           
>>>> priority="1"/>
>>>>         
>>>>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
>>>>>           
>>>> priority="1"/>
>>>>         
>>>>> </failoverdomain>
>>>>>           
>>>>> </failoverdomains>
>>>>>           
>>>>> <resources>
>>>>>           
>>>>> <ip address="xxx.xxx.xxx.xxx" monitor_link="1"/>
>>>>>           
>>>>> <fs device="/dev/sdc1" force_fsck="0" force_unmount="1"
>>>>>           
> fsid="57739"
>   
>>>>> fstype="ext3" mountpo
>>>>>           
>>>>> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
>>>>>           
>>>>> <nfsexport name="csarcsys-export"/>
>>>>>           
>>>>> <nfsclient name="csarcsys-nfs-client" options="no_root_squash,rw"
>>>>>           
>>>>> path="/csarc-test" targe
>>>>>           
>>>>> t="xxx.xxx.xxx.*"/>
>>>>>           
>>>>> </resources>
>>>>>           
>>>>> </rm>
>>>>>           
>>>>> </cluster>
>>>>>           
>>>>> Messages from the logs
>>>>>           
>>>>> ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>>>>>           
>>>>> Refusing connection.
>>>>>           
>>>>> Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing
>>>>>           
>>>>> connect: Connection refused
>>>>>           
>>>>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not
>>>>>           
> quorate.
>   
>>>>> Refusing connection.
>>>>>           
>>>>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing
>>>>>           
>>>>> connect: Connection refused
>>>>>           
>>>>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not
>>>>>           
> quorate.
>   
>>>>> Refusing connection.
>>>>>           
>>>>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing
>>>>>           
>>>>> connect: Connection refused
>>>>>           
>>>>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not
>>>>>           
> quorate.
>   
>>>>> Refusing connection.
>>>>>           
>>>>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing
>>>>>           
>>>>> connect: Connection refused
>>>>>           
>>>>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not
>>>>>           
> quorate.
>   
>>>>> Refusing connection.
>>>>>           
>>>>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing
>>>>>           
>>>>> connect: Connection refused
>>>>>           
>>     
>
------------------------------------------------------------------------
>   
>>>>> --
>>>>>           
>>>>> Linux-cluster mailing list
>>>>>           
>>>>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>>>>           
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>           
>>     
>
------------------------------------------------------------------------
>   
>>>>> --
>>>>>           
>>>>> Linux-cluster mailing list
>>>>>           
>>>>> Linux-cluster at redhat.com
>>>>>           
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>>           
>>>> --
>>>>         
>>>> Linux-cluster mailing list
>>>>         
>>>> Linux-cluster at redhat.com
>>>>         
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>         
>>>> --
>>>>         
>>>> Linux-cluster mailing list
>>>>         
>>>> Linux-cluster at redhat.com
>>>>         
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>         
>>> --
>>>       
>>> Linux-cluster mailing list
>>>       
>>> Linux-cluster at redhat.com
>>>       
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>       
>>> --
>>>       
>>> Linux-cluster mailing list
>>>       
>>> Linux-cluster at redhat.com
>>>       
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>       
>> --
>>
>> Linux-cluster mailing list
>>
>> Linux-cluster at redhat.com
>>
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>>
>> Linux-cluster mailing list
>>
>> Linux-cluster at redhat.com
>>
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>     
>
------------------------------------------------------------------------
>   
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>     
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list