[Linux-cluster] RE: Fencing quandry

Thu Oct 16 15:05:13 UTC 2008

As mentioned the version of the ilo firmware caused some issues for cluster admins because additional features/commands were incorporated. This topic was discussed at the Red Hat Summit and a single command of "COLD_BOOT_SERVER" would perform a power off/wait 4 seconds/cold boot the server. This directive was suggested as a replacement for the "HOLD_PWR_BTN" directive in the scripts

Greg Caetano
Hewlett-Packard Company
ESS Software
Platform & Business Enablement Solutions Engineering
Chicago, IL
greg.caetano at hp.com
Red Hat Certified Engineer
RHCE#805007310328754

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kevin Anderson
Sent: Wednesday, October 15, 2008 4:42 PM
To: linux clustering
Subject: RE: [Linux-cluster] RE: Fencing quandry

On Wed, 2008-10-15 at 17:38 -0400, jim parsons wrote:
> On Wed, 2008-10-15 at 20:45 +0000, Hofmeister, James (WTEC Linux) wrote:
> > Hello Jeff,
> >
> > RE: [Linux-cluster] RE: Fencing quandary
> >
> > The root issue is the ILO scripts are not up to date with the current firmware rev in the c-class and p-class blades.
> >
> > The method of '<device name="ilo01"/>' for a "reboot" is not working with this ILO firmware rev and the workaround is to send 2 commands to ILO under a single method... 'action="off"/' and 'action="on"/'.
> >
> > I had tested this with my p-class blades and it was successful.  I am still waiting for my customers test results on their c-class blades.
> >
> > ...yes this is the root issue to the ILO problem, but it does not completely address your concern.  I believe you are saying: That the RHCS does not accept a "power off" as a fence, but is requiring both "power off" followed by "power on".
> Right. It is failing because the 'power on' portion is not completing
> because the fence agent is unable to send the correct power on command.
>

But the point is, even if the power on command fails, the fencing agent
should report success, since the real need is to ensure the machine is
no longer participating in the cluster and not bring it back up.

So, is it proper to report success if part of the request fails as long
as the critical part succeeds?

Kevin

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster