[Linux-cluster] GFS+DRBD+Quorum: Help wrap my brain around this

Thu Nov 25 16:39:19 UTC 2010

On Tue, 23 Nov 2010 12:28:41 +0000, Colin Simpson wrote:

> Since the third node is ignorant about the status of DRBD, I don't
> really see what help it gives for it to decide on quorum.

I've just read through the "Best Practice with DRBD RHCS and GFS2" thread 
on the drbd-users list.  And I'm still missing what seems to me to be a 
fundamental issue.

First: It seems like you no longer (since 8.3.8) need to have GFS startup 
await the DRBD sync operation.  That's good, but is this because DRBD 
does the proper thing with I/O requests during a sync?  That's what I 
think is so, but then I don't understand why you'd an issue with 8.2.  Or 
am I missing something?

But the real issue for me is quorum/consensus.  I noted:

  startup {
    wfc-timeout 0 ;       # Wait forever for initial connection
    degr-wfc-timeout 60;  # Wait only 60 seconds if this node 
			  # was a degraded cluster
  }

and

        net
        {
                allow-two-primaries;
		after-sb-0pri discard-zero-changes;
		after-sb-1pri discard-secondary;
		after-sb-2pri disconnect;
	}

but when I break the DRBD connection between two primary nodes, 
"disconnected" apparently means that the nodes both continue as if 
they've UpToDate disks.  But this lets the data go out of sync.  Isn't 
this a Bad Thing?

Clearly, if there were some third party (ie. a quorum disk or a third 
node), this could be resolved.  But these don't seem to be required in 
the DRBD world, so how is this situation resolved?

DRBD supports fencing, so perhaps that is the answer?  I'm reluctant to 
make use of the cluster's fencing as - as described in the thread you 
referenced - cluster suite starts after DRBD.

I'm thinking of trying a fencing policy of resource-and-stonith where the 
the handler tries to get a shared semaphore (ie. connect to a port on a 
third server that accepts only a single connection at a time, or perhaps 
even just a lock on an file mounted via NFS from a third server).  If it 
raises the semaphore/gets the lock, it fences the DRBD peer.  If it 
doesn't, it either waits forever or marks itself as outdated.

This may also work to solve the startup "wait forever" problem, in that 
the starting node in WaitForConnect which gets the shared lock first gets 
to come up while the other is blocked.  I'm not yet sure how to implement 
this from DRBD's perspective, though.  I'm not clear that there's a 
handler that's called if DRBD starts and cannot establish an initial 
connection.

That I've found no mention of this idea leaves me suspicious that it 
won't work or that it's overkill.  Yet I cannot see why.  It follows the 
same model of quorum as the cluster software.

Thanks...

	Andrew