[Linux-cluster] 2-node fencing question

Lon Hohberger lhh at redhat.com
Thu Aug 3 19:25:46 UTC 2006


Sorry I didn't see this earlier!

On Wed, 2006-08-02 at 15:50 +0000, danwest at comcast.net wrote:
> It seems like a significant problem to have fence_ipmilan issue a power-off followed by a power-on with a 2 node cluster.

Generally, the chances of this occurring are very, very small, though
not impossible.

However, it could very well be that IPMI hardware modules are slow
enough at processing requests that this could pose a problem.  What
hardware has this happened on?  Was ACPI disabled on boot in the host OS
(it should be; see below)?


> This seems to make a 2-node cluster with ipmi fencing pointless.

I'm pretty sure that 'both-nodes-off problem' can only occur if all of
the following criteria are met:

(a) while using a separate NICs for IPMI and cluster traffic (the
recommended configuration),

(b) in the event of a network partition, such that both nodes can not
see each other but can see each other's IPMI port, and

(c) if both nodes send their power-off packets at or near the exact same
time.

The time window for (c) increases significantly (5+ seconds) if the
cluster nodes are enabling ACPI power events on boot.  This is one of
the reasons why booting with acpi=off is required when using IPMI, iLO,
or other integrated power management solutions.

If booting with acpi=off, does the problem persist?

> It looks like fence_ipmilan needs to support sending a cycle instead of a poweroff than a poweron?

The reason fence_ipmilan functions this way (off, status, on) is because
that we require a confirmation that the node has lost power.  I am not
sure that it is possible to confirm the node has rebooted using IPMI.

Arguably, it also might not be necessary to make such a confirmation in
this particular case.  

> According to fence_ipmilan.c it looks like cycle is not an option although it is an option for ipmitool.  (ipmitool -H <ipaddr> -U <userid> -P <password> chassis power cycle)

Looks like you're on the right track.

-- Lon




More information about the Linux-cluster mailing list