[Linux-cluster] really reliable?

Vu Pham vu at sivell.com
Wed Apr 15 01:16:43 UTC 2009


Ryan Golhar wrote:
> I'm running RHEL 5.3 64-bit.  So far, I only want to see that the
> cluster can run.  I'll worry about getting GFS after I'm confident this
> works.
> 
> I've got three nodes: pico, vail, and whistler.  They each have two NIC
> cards, one that provides a public IP address, and another that provides
> private communications.  All cluster traffic will go over the private
> network, 192.168.20.0.
> 
> I've installed only the following components:
> system-config-cluster-1.0.52-1.1, cman-2.0.98-1, and rgmanager-2.0.38-2.
> 
> I've created my cluster.conf file to include these three nodees and
> fence them using a brocade fibre switch (for GFS).
> 
> When I start the cluster services on all 3 nodes using the manually 
> method of:
> 
> /sbin/ccsd; /usr/sbin/cman_tool join
> 
> The nodes successfully form a cluster.  I am able to leave the cluster 
> and kill ccsd as well.
> 
> If I try to start the cman service I see:
> 
> [root at pico cluster]# /sbin/service cman start
> Starting cluster:
>    Loading modules... done
>    Mounting configfs... done
>    Starting ccsd... done
>    Starting cman... done
>    Starting daemons... done
>    Starting fencing...
> 
> 
> And it just hangs.  I know my fencing is set up correctly because I've 
> had nodes fence other nodes before (when I was trying with 6 members). 
> If I let it sit for long enough sometimes it finishes successfully.  I'm 
> not sure what its doing because fence_tool is called and its a binary...
> 

Ryan,

Anything suspicious in the log when it hangs at fencing ?
Could you show your cluster.conf ?

Vu

> Ryan
> 
> 
> Gordan Bobic wrote:
>> What distro are you using? I've found that:
>>
>> 1) Distros other than RHEL/CentOS can be quirky when it comes to using
>> RHCS. I've even run into problems on Fedora more than once (not to 
>> mention
>> that FC hasn't shipped GFS1 since FC5 and GFS2 hasn't been deemed
>> production stable until last month - and we're up to FC10 now).
>>
>> 2) Starting RHCS components using anything except the intended init 
>> scripts
>> tends to cause problems.
>>
>> 3) Source of 99% of problems in the rest of the cases (i.e. not 
>> covered by
>> 1) and 2) above) is incorrectly configured fencing.
>>
>> Does your setup fall under either of the first two categories?
>> Have you verified beyond doubt that your fencing is configured correctly
>> and that the fencing script gets verification upon success?
>>
>> Gordan
>>
>> On Tue, 14 Apr 2009 12:17:44 -0400, Ryan Golhar <golharam at umdnj.edu> 
>> wrote:
>>> Hi all,
>>>
>>> Is redhat cluster suite really reliable?  I've been having so much 
>>> trouble getting a cluster up and running, I'm beginning to second 
>>> guess my decision to use this software stack.
>>>
>>> I have 3 nodes (eventually 10) running and set up.  The fencing 
>>> method is by a brocade fibre switch.  The ultimate goal of this 
>>> cluster is to shared a SAN connected by fibre.
>>>
>>> I've installed just the bare minimum (before even getting to GFS) to 
>>> test the cluster software.  Just starting cman cluster services fails 
>>> on two of the nodes.
>>>
>>> Even when I try to reboot the nodes, I can't because the whole system 
>>> hangs on various processes that don't ever shut down.  I have to 
>>> physically reboot these boxes.
>>>
>>> The logs fill up with errors about not being able to connect to cman,
>> etc.
>>> I've been at it for awhile now and am not sure this is the best route 
>>> anymore.
>>>
>>> Ryan
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list