[Linux-cluster] Odd cluster problems
Lon Hohberger
lhh at redhat.com
Thu Aug 2 20:19:17 UTC 2007
On Thu, Aug 02, 2007 at 02:00:13PM -0500, Jay Leafey wrote:
> Lon Hohberger wrote:
> >On Tue, Jul 31, 2007 at 10:48:44AM -0500, Jay Leafey wrote:
> >>I've got a 3-node cluster running CentOS 4.5 and I cannot communicate
> >>with the resource group manager. When I use the clustat command I get a
> >>timeout:
> >>
> >>>[root at rapier ~]# clustat
> >>>Timed out waiting for a response from Resource Group Manager
> >>>Member Status: Quorate
> >>>
> >>> Member Name Status
> >>> ------ ---- ------
> >>> rapier.utmem.edu Online, Local, rgmanager
> >>> thorax.utmem.edu Offline
> >>> cyclops.utmem.edu Online, rgmanager
> >
> >>>Fence Domain: "default" 2 2 recover 4 -
> >>>[1 2]
> >
> >Until fencing completes, rgmanager won't respond.
> >
> >fence_ack_manual needs to be run.
> >
> >>><SNIP>
> >>>
> >>>User: "usrm::manager" 10 10 recover 2 -
> >>>[1 2]
> >>>
> >
>
> Your reply was a bit confusing at first, but looking deeper showed you
> were right on the mark. The systems (using HP ILO fencing) were unable
> to communicate with each other very well or with the ILO ports at all.
> Turns out some of the ports they were configured on had been moved to a
> different VLAN, so the network was split between the ILOs and the host
> ports.
Sorry, I just assumed you were using manual fencing as opposed to iLO,
since that's the 90+/- % case of why fencing was stuck in the 'recover'
state.
I guess we all know what happens when you assume... :) Or maybe, when I
assume?
-- Lon
--
Lon Hohberger - Software Engineer - Red Hat, Inc.
More information about the Linux-cluster
mailing list