[Linux-cluster] really reliable?

Steven Dake sdake at redhat.com
Wed Apr 15 01:19:59 UTC 2009


On Tue, 2009-04-14 at 20:16 -0500, Vu Pham wrote:
> Ryan Golhar wrote:
> > I'm running RHEL 5.3 64-bit.  So far, I only want to see that the
> > cluster can run.  I'll worry about getting GFS after I'm confident this
> > works.
> > 
> > I've got three nodes: pico, vail, and whistler.  They each have two NIC
> > cards, one that provides a public IP address, and another that provides
> > private communications.  All cluster traffic will go over the private
> > network, 192.168.20.0.
> > 
> > I've installed only the following components:
> > system-config-cluster-1.0.52-1.1, cman-2.0.98-1, and rgmanager-2.0.38-2.
> > 
> > I've created my cluster.conf file to include these three nodees and
> > fence them using a brocade fibre switch (for GFS).
> > 
> > When I start the cluster services on all 3 nodes using the manually 
> > method of:
> > 
> > /sbin/ccsd; /usr/sbin/cman_tool join
> > 
> > The nodes successfully form a cluster.  I am able to leave the cluster 
> > and kill ccsd as well.
> > 
> > If I try to start the cman service I see:
> > 
> > [root at pico cluster]# /sbin/service cman start
> > Starting cluster:
> >    Loading modules... done
> >    Mounting configfs... done
> >    Starting ccsd... done
> >    Starting cman... done
> >    Starting daemons... done
> >    Starting fencing...
> > 
> > 
> > And it just hangs.  I know my fencing is set up correctly because I've 
> > had nodes fence other nodes before (when I was trying with 6 members). 
> > If I let it sit for long enough sometimes it finishes successfully.  I'm 
> > not sure what its doing because fence_tool is called and its a binary...
> > 
> 
> Ryan,
> 
> Anything suspicious in the log when it hangs at fencing ?
> Could you show your cluster.conf ?
> 

A hang in fencing may indicate that the cluster does not have quorum.
run cman_tool nodes to see a list of nodes and see if half+1 are in the
cluster.

Regards
-steve

> Vu
> 
> > Ryan
> > 
> > 
> > Gordan Bobic wrote:
> >> What distro are you using? I've found that:
> >>
> >> 1) Distros other than RHEL/CentOS can be quirky when it comes to using
> >> RHCS. I've even run into problems on Fedora more than once (not to 
> >> mention
> >> that FC hasn't shipped GFS1 since FC5 and GFS2 hasn't been deemed
> >> production stable until last month - and we're up to FC10 now).
> >>
> >> 2) Starting RHCS components using anything except the intended init 
> >> scripts
> >> tends to cause problems.
> >>
> >> 3) Source of 99% of problems in the rest of the cases (i.e. not 
> >> covered by
> >> 1) and 2) above) is incorrectly configured fencing.
> >>
> >> Does your setup fall under either of the first two categories?
> >> Have you verified beyond doubt that your fencing is configured correctly
> >> and that the fencing script gets verification upon success?
> >>
> >> Gordan
> >>
> >> On Tue, 14 Apr 2009 12:17:44 -0400, Ryan Golhar <golharam at umdnj.edu> 
> >> wrote:
> >>> Hi all,
> >>>
> >>> Is redhat cluster suite really reliable?  I've been having so much 
> >>> trouble getting a cluster up and running, I'm beginning to second 
> >>> guess my decision to use this software stack.
> >>>
> >>> I have 3 nodes (eventually 10) running and set up.  The fencing 
> >>> method is by a brocade fibre switch.  The ultimate goal of this 
> >>> cluster is to shared a SAN connected by fibre.
> >>>
> >>> I've installed just the bare minimum (before even getting to GFS) to 
> >>> test the cluster software.  Just starting cman cluster services fails 
> >>> on two of the nodes.
> >>>
> >>> Even when I try to reboot the nodes, I can't because the whole system 
> >>> hangs on various processes that don't ever shut down.  I have to 
> >>> physically reboot these boxes.
> >>>
> >>> The logs fill up with errors about not being able to connect to cman,
> >> etc.
> >>> I've been at it for awhile now and am not sure this is the best route 
> >>> anymore.
> >>>
> >>> Ryan
> >>
> >> -- 
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list