[Linux-cluster] qdisk WITHOUT fencing

Fri Jun 18 08:09:41 UTC 2010

Hi,

On Thu, Jun 17, 2010 at 05:58:44PM +0200,
jimbob palmer <jimbobpalmer at gmail.com> wrote:
> I have two data centers linked by physical fibre. Everything goes over
> this physical route: everything.
> 
> I would like to setup a high availability nfs server with drbd:
> * drbd to replicate storage
> * nfsd running
> * floating ip
> 
> If the physical link between the two data centers is lost, I would
> like the primary data center to win.

This is a real problem as described by other in ths thread, already. It isn't
that easy to resolve realiable with the current architecture.

In my opinion, a third independent location (i.e. third datacenter) with a
third node/quorum server would be a solution. But the problem to fence the
node persists if one Datacenter fails. However, as there would be
a majority, because the two other Datacenters are still alive, fencing could
be scripted to be not that strict... Of course, many scenarios are thinkable.

> I've setup a qdisk, and this works well: the node which can access the
> qdisk wins. i.e. the primary datacenter, which is the data center
> where the san holding the qdisk also lives, wins.

fenced creates a FIFO, if it was not able to fence the failed node. In RHCS it
will be created in /var/run/cluster/fenced_override. You can override fencing
by using this FIFO.

You can use fence_ack_manual to "ack" fencing by using the FIFO in case fenced
is not able to fence successfully. I. e.

    fence_ack_manual -eOn <name of failed node>

After this, the remaining node will continue its work. You might be able to
put this in some scripting logic.

Of course, this will not solve the entire problem and you will have the risk
to have a split-brain in the end.

Regards,
Volker