[Linux-cluster] Diskless Quorum Disk

Fri Jun 22 20:36:04 UTC 2007

On Wed, Jun 20, 2007 at 05:57:05PM -0500, Chris Harms wrote:
> My nodes were set to "quorum=1 two_node=1" and fenced by DRAC cards 
> using telnet over their NICs.  The same NICs used in my bonded config on 
> the OS so I assumed it was on the same network path.  Perhaps I assume 
> incorrectly.

That sounds mostly right.  The point is that a node disconnected from
the cluster must not be able to fence a node which is supposedly still
connected.

That is: 'A' must not be able to fence 'B' if 'A' becomes disconnected
from the cluster.  However, 'A' must be able to be fenced if 'A' becomes
disconnected.

Why was DRAC unreachable; was it unplugged too? (Is DRAC like IPMI - in
that it shares a NIC with the host machine?)

> Desired effect would be survivor claims service(s) running on 
> unreachable node and attempts to fence unreachable node or bring it back 
> online without fencing should it establish contact.  Actual result was 
> survivor spun its wheels trying to fence unreachable node and did not 
> assume services.

Yes, this is an unfortunate limitation of using (most) integrated power
management systems.  Basically, some BMCs share a NIC with the host
(IPMI), and some run off of the machine's power supply (IPMI, iLO,
DRAC).  When the fence device becomes unreachable, we don't know whether
it's a total network outage or a "power disconnected" state.

* If the power to a node has been disconnected, it's safe to recover.

* If the node just lost all of its network connectivity, it's *NOT* safe
to recover.

* In both cases, we can not confirm the node is dead... which is why we
don't recover.

> Restoring network connectivity induced the previously 
> unreachable node to reboot and the surviving node experienced some kind 
> of weird power off and then powered back on (???).

That doesn't sound right; the surviving node should have stayed put (not
rebooted).

> Ergo I figured I must need quorum disk so I can use something like a 
> ping node.  My present plan is to use a loop device for the quorum disk 
> device and then setup ping heuristics.  Will this even work, i.e. do the 
> nodes both need to see the same qdisk or can I fool the service with a 
> loop device?

I don't believe the effect of tricking qdiskd in this way have been
explored; I don't see why it wouldn't work in theory, but... qdiskd with
or without a disk won't fix the behavior you experienced (uncertain
state due to failure to fence -> retry / wait for node to come back).

> I am not deploying GFS or GNDB and I have no SAN.  My only 
> option would be to add another DRBD partition for this purpose which may 
> or may not work.

> What is the proper setup option, two_node=1 or qdisk?

In your case, I'd say two_node="1".

-- 
Lon Hohberger - Software Engineer - Red Hat, Inc.