[Linux-cluster] 2-node fencing question (IPMI/ACPI question)
danwest
danwest at comcast.net
Tue Sep 5 11:43:42 UTC 2006
What happens if the servers you are using require ACPI=on in order to
boot. For instance IBM X366 servers need ACPI set in order to boot.
With ACPI=on both nodes reboot when a fence occurs(see "both nodes off
problem" in thread below). This is not desirable, especially with
active/active clusters.
Thanks,
dan
> Sorry I didn't see this earlier!
>
> On Wed, 2006-08-02 at 15:50 +0000, danwest at comcast.net wrote:
> > It seems like a significant problem to have fence_ipmilan issue a power-off followed by a power-on with a 2 node cluster.
>
> Generally, the chances of this occurring are very, very small, though
> not impossible.
>
> However, it could very well be that IPMI hardware modules are slow
> enough at processing requests that this could pose a problem. What
> hardware has this happened on? Was ACPI disabled on boot in the host OS
> (it should be; see below)?
>
>
> > This seems to make a 2-node cluster with ipmi fencing pointless.
>
> I'm pretty sure that 'both-nodes-off problem' can only occur if all of
> the following criteria are met:
>
> (a) while using a separate NICs for IPMI and cluster traffic (the
> recommended configuration),
>
> (b) in the event of a network partition, such that both nodes can not
> see each other but can see each other's IPMI port, and
>
> (c) if both nodes send their power-off packets at or near the exact same
> time.
>
> The time window for (c) increases significantly (5+ seconds) if the
> cluster nodes are enabling ACPI power events on boot. This is one of
> the reasons why booting with acpi=off is required when using IPMI, iLO,
> or other integrated power management solutions.
>
> If booting with acpi=off, does the problem persist?
>
> > It looks like fence_ipmilan needs to support sending a cycle instead of a poweroff than a poweron?
>
> The reason fence_ipmilan functions this way (off, status, on) is because
> that we require a confirmation that the node has lost power. I am not
> sure that it is possible to confirm the node has rebooted using IPMI.
>
> Arguably, it also might not be necessary to make such a confirmation in
> this particular case.
>
> > According to fence_ipmilan.c it looks like cycle is not an option although it is an option for ipmitool. (ipmitool -H <ipaddr> -U <userid> -P <password> chassis power cycle)
>
> Looks like you're on the right track.
>
> -- Lon
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list