[Linux-cluster] Is IPMI fencing considered certified by Red Hat?

Lon Hohberger lhh at redhat.com
Thu Sep 7 21:50:14 UTC 2006


On Thu, 2006-09-07 at 00:59 -0300, Celso K. Webber wrote:
> Hello friends,
> 
> Regarding Red Hat Cluster Suite and/or GFS, could someone from Red Hat 
> please tell me if the use of IPMI embedded devices from the servers' 
> motherboards is officially certified by Red Hat?
> 
> I'd like to have this information so that we can recommend (or not) to 
> customers the use of IPMI as a secure form of fencing.
> 
> We had some bad experiences recently on some servers where only one of 
> the onboard NICs listened to the IPMI over LAN packets, so it appeared 
> to us that sometimes IPMI is not that safe as a fence device. Of course 
> the Cluster software will assume nothing when the fencing fails, but the 
> bad thing is that there is no automatic failover on this situation.

It's supported, but there are a couple of caveats that you should be
aware of:

(a) You should, if possible, use the IPMI-enabled NIC only for IPMI
traffic.  At least, you should not use it for cluster communication
traffic - though it is fine for service-related (e.g. rgmanager, etc.)
and other traffic.  That way, the IPMI-enabled port can't become a
single point of failure.

Here's why: If IPMI and cluster traffic are using the same NIC, then
that NIC failing (or becoming disconnected) will cause the node to be
evicted -- but prevent fencing, because the IPMI host will be
unreachable.

Similarly, on a machine with a single power supply + IPMI fencing in a
cluster, the power cord becomes a SPF - if you pull the power, the host
is dead and fencing cannot complete (because IPMI does not have power
either!), which leads to...


(b) If you do not have *both* dual power supplies and dual NICs, you
need something else (in addition to IPMI) if NSPF is a requirement for
your particular installation.  For example, what one linux-cluster user
did was add their fiber channel switch as a secondary fence device (in
its own fence level).  His cluster tries to fence using IPMI.  Failing
that, the cluster falls back to fencing via the fiber switch.


(c) You often need to disable ACPI on hardware which has IPMI if you
intend to use IPMI for fencing.  This can vary on a per-machine basis,
so you should check first.  If a host does a "graceful shutdown" when
you fence it via IPMI, you need to disable ACPI on that host (e.g. boot
with acpi=off).  The server should turn off immediately (or within 4-5
seconds, like when holding an ATX power button in to force a machine
off).


Hope that helps!

-- Lon




More information about the Linux-cluster mailing list