[Linux-cluster] 3 node cluster problems

Bennie Thomas Bennie_R_Thomas at raytheon.com
Tue Mar 25 15:26:41 UTC 2008


I am currently running several 3-node cluster without a quorum disk. 
However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you send 
your /etc/hosts file
for all systems, Also, could there be another node name called 
csarcsys3-eth0 in your NIS or DNS

I configured some using Conga and some with system-config-cluster. When 
using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames and 
cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config files.

The file /var/log/messages should also shed some light on the problem.

Dalton, Maurice wrote:
>
> Same problem.
>
> I now have qdiskd running.
>
> I have ran diff’s on all three cluster.conf files.. all are the same
>
> [root at csarcsys1-eth0 cluster]# more cluster.conf
>
> <?xml version="1.0"?>
>
> <cluster config_version="6" name="csarcsys5">
>
> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>
> <clusternodes>
>
> <clusternode name="csarcsys1-eth0" nodeid="1" votes="1">
>
> <fence/>
>
> </clusternode>
>
> <clusternode name="csarcsys2-eth0" nodeid="2" votes="1">
>
> <fence/>
>
> </clusternode>
>
> <clusternode name="csarcsys3-eth0" nodeid="3" votes="1">
>
> <fence/>
>
> </clusternode>
>
> </clusternodes>
>
> <cman/>
>
> <fencedevices/>
>
> <rm>
>
> <failoverdomains>
>
> <failoverdomain name="csarcsysfo" ordered="0" restricted="1">
>
> <failoverdomainnode name="csarcsys1-eth0" priority="1"/>
>
> <failoverdomainnode name="csarcsys2-eth0" priority="1"/>
>
> <failoverdomainnode name="csarcsys3-eth0" priority="1"/>
>
> </failoverdomain>
>
> </failoverdomains>
>
> <resources>
>
> <ip address="172.24.86.177" monitor_link="1"/>
>
> <fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739" 
> fstype="ext3" mountpo
>
> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
>
> </resources>
>
> </rm>
>
> <quorumd interval="4" label="csarcsysQ" min_score="1" tko="30" votes="2"/>
>
> </cluster>
>
> More info from csarcsys3
>
> [root at csarcsys3-eth0 cluster]# clustat
>
> msg_open: No such file or directory
>
> Member Status: Inquorate
>
> Member Name ID Status
>
> ------ ---- ---- ------
>
> csarcsys1-eth0 1 Offline
>
> csarcsys2-eth0 2 Offline
>
> csarcsys3-eth0 3 Online, Local
>
> /dev/sdd1 0 Offline
>
> [root at csarcsys3-eth0 cluster]# mkqdisk -L
>
> mkqdisk v0.5.1
>
> /dev/sdd1:
>
> Magic: eb7a62c2
>
> Label: csarcsysQ
>
> Created: Wed Feb 13 13:44:35 2008
>
> Host: csarcsys1-eth0.xxx.xxx.nasa.gov
>
> [root at csarcsys3-eth0 cluster]# ls -l /dev/sdd1
>
> brw-r----- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1
>
> clustat from csarcsys1
>
> msg_open: No such file or directory
>
> Member Status: Quorate
>
> Member Name ID Status
>
> ------ ---- ---- ------
>
> csarcsys1-eth0 1 Online, Local
>
> csarcsys2-eth0 2 Online
>
> csarcsys3-eth0 3 Offline
>
> /dev/sdd1 0 Offline, Quorum Disk
>
> [root at csarcsys1-eth0 cluster]# ls -l /dev/sdd1
>
> brw-r----- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1
>
> mkqdisk v0.5.1
>
> /dev/sdd1:
>
> Magic: eb7a62c2
>
> Label: csarcsysQ
>
> Created: Wed Feb 13 13:44:35 2008
>
> Host: csarcsys1-eth0.xxx.xxx.nasa.gov
>
> Info from csarcsys2
>
> root at csarcsys2-eth0 cluster]# clustat
>
> msg_open: No such file or directory
>
> Member Status: Quorate
>
> Member Name ID Status
>
> ------ ---- ---- ------
>
> csarcsys1-eth0 1 Offline
>
> csarcsys2-eth0 2 Online, Local
>
> csarcsys3-eth0 3 Offline
>
> /dev/sdd1 0 Online, Quorum Disk
>
> *From:* linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Panigrahi, 
> Santosh Kumar
> *Sent:* Tuesday, March 25, 2008 7:33 AM
> *To:* linux clustering
> *Subject:* RE: [Linux-cluster] 3 node cluster problems
>
> If you are configuring your cluster by system-config-cluster then no 
> need to run ricci/luci. Ricci/luci needed for configuring the cluster 
> using conga. You can configure in either ways.
>
> On seeing your clustat command outputs, it seems cluster is 
> partitioned (spilt brain) into 2 sub clusters [Sub1-* 
> **(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without a 
> quorum device you can more often face this situation. To avoid this 
> you can configure a quorum device with a heuristic like ping message. 
> Use the link 
> (http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/) 
> for configuring a quorum disk in RHCS.
>
> Thanks,
>
> S
>
> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Dalton, Maurice
> Sent: Tuesday, March 25, 2008 5:18 PM
> To: linux clustering
> Subject: RE: [Linux-cluster] 3 node cluster problems
>
> Still no change. Same as below.
>
> I completely rebuilt the cluster using system-config-cluster
>
> The Cluster software was installed from rhn, luci and ricci are running.
>
> This is the new config file and it has been copied to the 2 other
>
> systems
>
> [root at csarcsys1-eth0 cluster]# more cluster.conf
>
> <?xml version="1.0"?>
>
> <cluster config_version="5" name="csarcsys5">
>
> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>
> <clusternodes>
>
> <clusternode name="csarcsys1-eth0" nodeid="1" votes="1">
>
> <fence/>
>
> </clusternode>
>
> <clusternode name="csarcsys2-eth0" nodeid="2" votes="1">
>
> <fence/>
>
> </clusternode>
>
> <clusternode name="csarcsys3-eth0" nodeid="3" votes="1">
>
> <fence/>
>
> </clusternode>
>
> </clusternodes>
>
> <cman/>
>
> <fencedevices/>
>
> <rm>
>
> <failoverdomains>
>
> <failoverdomain name="csarcsysfo" ordered="0"
>
> restricted="1">
>
> <failoverdomainnode
>
> name="csarcsys1-eth0" priority="1"/>
>
> <failoverdomainnode
>
> name="csarcsys2-eth0" priority="1"/>
>
> <failoverdomainnode
>
> name="csarcsys3-eth0" priority="1"/>
>
> </failoverdomain>
>
> </failoverdomains>
>
> <resources>
>
> <ip address="172.xx.xx.xxx" monitor_link="1"/>
>
> <fs device="/dev/sdc1" force_fsck="0"
>
> force_unmount="1" fsid="57739" fstype="ext3" mountpo
>
> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
>
> </resources>
>
> </rm>
>
> </cluster>
>
> -----Original Message-----
>
> From: linux-cluster-bounces at redhat.com
>
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
>
> Sent: Monday, March 24, 2008 4:17 PM
>
> To: linux clustering
>
> Subject: Re: [Linux-cluster] 3 node cluster problems
>
> Did you load the Cluster software via Conga or manually ? You would have
>
> had to load
>
> luci on one node and ricci on all three.
>
> Try copying the modified /etc/cluster/cluster.conf from csarcsys1 to the
>
> other two nodes.
>
> Make sure you can ping the private interface to/from all nodes and
>
> reboot. If this does not work
>
> post your /etc/cluster/cluster.conf file again.
>
> Dalton, Maurice wrote:
>
> > Yes
>
> > I also rebooted again just now to be sure.
>
> >
>
> >
>
> > -----Original Message-----
>
> > From: linux-cluster-bounces at redhat.com
>
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
>
> > Sent: Monday, March 24, 2008 3:33 PM
>
> > To: linux clustering
>
> > Subject: Re: [Linux-cluster] 3 node cluster problems
>
> >
>
> > When you changed the nodenames in the /etc/lcuster/cluster.conf and
>
> made
>
> >
>
> > sure the /etc/hosts
>
> > file had the correct nodenames (Ie. 10.0.0.100 csarcsys1-eth0
>
> > csarcsys1-eth0.xxxx.xxxx.xxx.)
>
> > Did you reboot all the nodes at the sametime ?
>
> >
>
> > Dalton, Maurice wrote:
>
> >
>
> >> No luck. It seems as if csarcsys3 thinks its in his own cluster
>
> >> I renamed all config files and rebuilt from system-config-cluster
>
> >>
>
> >> Clustat command from csarcsys3
>
> >>
>
> >>
>
> >> [root at csarcsys3-eth0 cluster]# clustat
>
> >> msg_open: No such file or directory
>
> >> Member Status: Inquorate
>
> >>
>
> >> Member Name ID Status
>
> >> ------ ---- ---- ------
>
> >> csarcsys1-eth0 1 Offline
>
> >> csarcsys2-eth0 2 Offline
>
> >> csarcsys3-eth0 3 Online, Local
>
> >>
>
> >> clustat command from csarcsys2
>
> >>
>
> >> [root at csarcsys2-eth0 cluster]# clustat
>
> >> msg_open: No such file or directory
>
> >> Member Status: Quorate
>
> >>
>
> >> Member Name ID Status
>
> >> ------ ---- ---- ------
>
> >> csarcsys1-eth0 1 Online
>
> >> csarcsys2-eth0 2 Online, Local
>
> >> csarcsys3-eth0 3 Offline
>
> >>
>
> >>
>
> >> -----Original Message-----
>
> >> From: linux-cluster-bounces at redhat.com
>
> >> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Bennie Thomas
>
> >> Sent: Monday, March 24, 2008 2:25 PM
>
> >> To: linux clustering
>
> >> Subject: Re: [Linux-cluster] 3 node cluster problems
>
> >>
>
> >> You will also, need to make sure the clustered nodenames are in your
>
> >> /etc/hosts file.
>
> >> Also, make sure your cluster network interface is up on all nodes and
>
> >> that the
>
> >> /etc/cluster/cluster.conf are the same on all nodes.
>
> >>
>
> >>
>
> >>
>
> >> Dalton, Maurice wrote:
>
> >>
>
> >>
>
> >>> The last post is incorrect.
>
> >>>
>
> >>> Fence is still hanging at start up.
>
> >>>
>
> >>> Here's another log message.
>
> >>>
>
> >>> Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing
>
> >>> connect: Connection refused
>
> >>>
>
> >>> Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs
>
> >>> error -111, check ccsd or cluster status
>
> >>>
>
> >>> *From:* linux-cluster-bounces at redhat.com
>
> >>> [mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Bennie
>
> >>>
>
> > Thomas
>
> >
>
> >>> *Sent:* Monday, March 24, 2008 11:22 AM
>
> >>> *To:* linux clustering
>
> >>> *Subject:* Re: [Linux-cluster] 3 node cluster problems
>
> >>>
>
> >>> try removing the fully qualified hostname from the cluster.conf
>
> file.
>
> >>>
>
> >>>
>
> >>> Dalton, Maurice wrote:
>
> >>>
>
> >>> I have NO fencing equipment
>
> >>>
>
> >>> I have been task to setup a 3 node cluster
>
> >>>
>
> >>> Currently I have having problems getting cman(fence) to start
>
> >>>
>
> >>> Fence will try to start up during cman start up but will fail
>
> >>>
>
> >>> I tried to run /sbin/fenced -D - I get the following
>
> >>>
>
> >>> 1206373475 cman_init error 0 111
>
> >>>
>
> >>> Here's my cluster.conf file
>
> >>>
>
> >>> <?xml version="1.0"?>
>
> >>>
>
> >>> <cluster alias="csarcsys51" config_version="26" name="csarcsys51">
>
> >>>
>
> >>> <fence_daemon clean_start="0" post_fail_delay="0"
>
> >>>
>
> >>>
>
> >> post_join_delay="3"/>
>
> >>
>
> >>
>
> >>> <clusternodes>
>
> >>>
>
> >>> <clusternode name="csarcsys1-eth0.xxx.xxxx.nasa.gov" nodeid="1"
>
> >>>
>
> >>>
>
> >> votes="1">
>
> >>
>
> >>
>
> >>> <fence/>
>
> >>>
>
> >>> </clusternode>
>
> >>>
>
> >>> <clusternode name="csarcsys2-eth0.xxx.xxxx.nasa.gov" nodeid="2"
>
> >>>
>
> >>>
>
> >> votes="1">
>
> >>
>
> >>
>
> >>> <fence/>
>
> >>>
>
> >>> </clusternode>
>
> >>>
>
> >>> <clusternode name="csarcsys3-eth0.xxx.xxxxnasa.gov" nodeid="3"
>
> >>>
>
> >>>
>
> >> votes="1">
>
> >>
>
> >>
>
> >>> <fence/>
>
> >>>
>
> >>> </clusternode>
>
> >>>
>
> >>> </clusternodes>
>
> >>>
>
> >>> <cman/>
>
> >>>
>
> >>> <fencedevices/>
>
> >>>
>
> >>> <rm>
>
> >>>
>
> >>> <failoverdomains>
>
> >>>
>
> >>> <failoverdomain name="csarcsys-fo" ordered="1" restricted="0">
>
> >>>
>
> >>> <failoverdomainnode name="csarcsys1-eth0.xxx.xxxx.nasa.gov"
>
> >>>
>
> >>>
>
> >> priority="1"/>
>
> >>
>
> >>
>
> >>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
>
> >>>
>
> >>>
>
> >> priority="1"/>
>
> >>
>
> >>
>
> >>> <failoverdomainnode name="csarcsys2-eth0.xxx.xxxx.nasa.gov"
>
> >>>
>
> >>>
>
> >> priority="1"/>
>
> >>
>
> >>
>
> >>> </failoverdomain>
>
> >>>
>
> >>> </failoverdomains>
>
> >>>
>
> >>> <resources>
>
> >>>
>
> >>> <ip address="xxx.xxx.xxx.xxx" monitor_link="1"/>
>
> >>>
>
> >>> <fs device="/dev/sdc1" force_fsck="0" force_unmount="1" fsid="57739"
>
> >>> fstype="ext3" mountpo
>
> >>>
>
> >>> int="/csarc-test" name="csarcsys-fs" options="rw" self_fence="0"/>
>
> >>>
>
> >>> <nfsexport name="csarcsys-export"/>
>
> >>>
>
> >>> <nfsclient name="csarcsys-nfs-client" options="no_root_squash,rw"
>
> >>> path="/csarc-test" targe
>
> >>>
>
> >>> t="xxx.xxx.xxx.*"/>
>
> >>>
>
> >>> </resources>
>
> >>>
>
> >>> </rm>
>
> >>>
>
> >>> </cluster>
>
> >>>
>
> >>> Messages from the logs
>
> >>>
>
> >>> ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>
> >>> Refusing connection.
>
> >>>
>
> >>> Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing
>
> >>> connect: Connection refused
>
> >>>
>
> >>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>
> >>> Refusing connection.
>
> >>>
>
> >>> Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing
>
> >>> connect: Connection refused
>
> >>>
>
> >>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>
> >>> Refusing connection.
>
> >>>
>
> >>> Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing
>
> >>> connect: Connection refused
>
> >>>
>
> >>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>
> >>> Refusing connection.
>
> >>>
>
> >>> Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing
>
> >>> connect: Connection refused
>
> >>>
>
> >>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate.
>
> >>> Refusing connection.
>
> >>>
>
> >>> Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing
>
> >>> connect: Connection refused
>
> >>>
>
> >>>
>
> >>>
>
> >>>
>
> >>>
>
> >
>
> ------------------------------------------------------------------------
>
> >
>
> >>
>
> >>
>
> >>>
>
> >>>
>
> >>> --
>
> >>> Linux-cluster mailing list
>
> >>> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> >>>
>
> >>>
>
> >>>
>
> >>>
>
> >
>
> ------------------------------------------------------------------------
>
> >
>
> >>
>
> >>
>
> >>> --
>
> >>> Linux-cluster mailing list
>
> >>> Linux-cluster at redhat.com
>
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> >>>
>
> >>>
>
> >>
>
> >> --
>
> >> Linux-cluster mailing list
>
> >> Linux-cluster at redhat.com
>
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> >>
>
> >> --
>
> >> Linux-cluster mailing list
>
> >> Linux-cluster at redhat.com
>
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> >>
>
> >>
>
> >
>
> >
>
> > --
>
> > Linux-cluster mailing list
>
> > Linux-cluster at redhat.com
>
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> >
>
> > --
>
> > Linux-cluster mailing list
>
> > Linux-cluster at redhat.com
>
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> >
>
> --
>
> Linux-cluster mailing list
>
> Linux-cluster at redhat.com
>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
>
> Linux-cluster mailing list
>
> Linux-cluster at redhat.com
>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster






More information about the Linux-cluster mailing list