[Linux-cluster] RE: Fencing quandry

Wed Oct 15 21:42:23 UTC 2008

On Wed, 2008-10-15 at 17:38 -0400, jim parsons wrote:
> On Wed, 2008-10-15 at 20:45 +0000, Hofmeister, James (WTEC Linux) wrote:
> > Hello Jeff,
> > 
> > RE: [Linux-cluster] RE: Fencing quandary
> > 
> > The root issue is the ILO scripts are not up to date with the current firmware rev in the c-class and p-class blades.
> > 
> > The method of '<device name="ilo01"/>' for a "reboot" is not working with this ILO firmware rev and the workaround is to send 2 commands to ILO under a single method... 'action="off"/' and 'action="on"/'.
> > 
> > I had tested this with my p-class blades and it was successful.  I am still waiting for my customers test results on their c-class blades.
> > 
> > ...yes this is the root issue to the ILO problem, but it does not completely address your concern.  I believe you are saying: That the RHCS does not accept a "power off" as a fence, but is requiring both "power off" followed by "power on".
> Right. It is failing because the 'power on' portion is not completing
> because the fence agent is unable to send the correct power on command.
> 

But the point is, even if the power on command fails, the fencing agent
should report success, since the real need is to ensure the machine is
no longer participating in the cluster and not bring it back up.

So, is it proper to report success if part of the request fails as long
as the critical part succeeds?

Kevin