[Linux-cluster] 2-node fencing question (IPMI/ACPI question)

danwest danwest at comcast.net
Tue Sep 5 11:43:42 UTC 2006


What happens if the servers you are using require ACPI=on in order to
boot.  For instance IBM X366 servers need ACPI set in order to boot.
With ACPI=on both nodes reboot when a fence occurs(see "both nodes off
problem" in thread below).  This is not desirable, especially with
active/active clusters.

Thanks,
 dan

> Sorry I didn't see this earlier!
> 
> On Wed, 2006-08-02 at 15:50 +0000, danwest at comcast.net wrote:
> > It seems like a significant problem to have fence_ipmilan issue a power-off followed by a power-on with a 2 node cluster.
> 
> Generally, the chances of this occurring are very, very small, though
> not impossible.
> 
> However, it could very well be that IPMI hardware modules are slow
> enough at processing requests that this could pose a problem.  What
> hardware has this happened on?  Was ACPI disabled on boot in the host OS
> (it should be; see below)?
> 
> 
> > This seems to make a 2-node cluster with ipmi fencing pointless.
> 
> I'm pretty sure that 'both-nodes-off problem' can only occur if all of
> the following criteria are met:
> 
> (a) while using a separate NICs for IPMI and cluster traffic (the
> recommended configuration),
> 
> (b) in the event of a network partition, such that both nodes can not
> see each other but can see each other's IPMI port, and
> 
> (c) if both nodes send their power-off packets at or near the exact same
> time.
> 
> The time window for (c) increases significantly (5+ seconds) if the
> cluster nodes are enabling ACPI power events on boot.  This is one of
> the reasons why booting with acpi=off is required when using IPMI, iLO,
> or other integrated power management solutions.
> 
> If booting with acpi=off, does the problem persist?
> 
> > It looks like fence_ipmilan needs to support sending a cycle instead of a poweroff than a poweron?
> 
> The reason fence_ipmilan functions this way (off, status, on) is because
> that we require a confirmation that the node has lost power.  I am not
> sure that it is possible to confirm the node has rebooted using IPMI.
> 
> Arguably, it also might not be necessary to make such a confirmation in
> this particular case.  
> 
> > According to fence_ipmilan.c it looks like cycle is not an option although it is an option for ipmitool.  (ipmitool -H <ipaddr> -U <userid> -P <password> chassis power cycle)
> 
> Looks like you're on the right track.
> 
> -- Lon
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list