[Linux-cluster] GFS+DRBD+Quorum: Help wrap my brain around this

Tue Nov 23 11:09:33 UTC 2010

Hi,
 just my 0.02 below

On Mon, 22 Nov 2010 21:21:50 +0000 (UTC), "A. Gideon"
<ag8817282 at gideon.org> wrote:
> On Sun, 21 Nov 2010 21:46:03 +0000, Colin Simpson wrote:
> 
> 
>> I suppose what I'm saying is that there is no real way to get a quorum
>> disk with DRBD. And basically it doesn't really gain you anything
>> without actual shared storage.
> 
> I understand that.  That's why I'm looking for that "external" solution 
> (ie. a separate iSCSI volume from a third machine) to act as a quorum 
> disk (effectively making that third machine a quorum server).
> 

why introducing iSCSI on a third machine at all? Just having a third node
(even not running any cluster services - just cman) you will get the
tiebreaker function like from a quorum disk ... well the drawback is the
requirement for fencing on the third machine too, but i consider that a
bonus :) as even the less important services running on that machine get
some protection, more if you run them in separate failover domain for that
host only.

> But I'm not clear how important this is.  I think the problem is that, 
> while I've some familiarity with clustering, I've less with DRBD.  I 
> don't understand how DRBD handles the matter of quorum given only two 
> potential voters.
> 
> [...]
>> The scenario is well mitigated by DRBD on two nodes already without
>> this. The system will not, if you config properly,  start DRBD (and all
>> the cluster storage stuff after, presuming your start up files are in
>> the right order) until it sees the second node. 
> 
> So if one node fails, the mirror is broken but storage is still 
> available?  But if both nodes go down, storage only becomes available 
> again once both nodes are up?  I've missed this in the documentation,
I'm 
> afraid.
> 
> [...]
>> The situation of two nodes coming up when the out of date one comes up
>> first should never arise if you give it sufficient time to see the
other
>> node (it will always pick the new good one's data), you can make it
wait
>> forever and then require manual intervention if you prefer (should a
>> node be down for an extended period). 
> 
> Waiting forever for the second node seems a little strict to me, though
I 
> suppose if the second node is the node with the most up-to-date data
then 
> this is the proper thing to do.  But waiting forever for the node that 
> has outdated information seems inefficient, though I see it is caused by

> the fact that DRBD has no way to know which node is more up-to-date.
> 
> Am I understanding that correctly?

it is preferable to do it this way to guarantee that the data won't be
corrupted. Lets asume you have 3 (Node1, Node2 and Node3) nodes and two of
them (1 and 2 only) are running DRBD. If Node1 fails you have quorum and
Node2 or Node3 will fence it, but if some time after fencing (before node1
is back, but new data was written to DRBD) node2 freezes? Node3 can't fence
it, because it lost quorum, then when Node1 (re)joins the cluster and
quorum is restored, Node3 will fence Node2 and if you don't wait enough for
Node2 to boot (because it was checking HDDs or other extended delay), then
Node1 is started with the old data.

If just one node failed - you won't have to wait too long, but in case
both have failed you need them both up before touching any data to avoid
corruption.

As an additional step you may set fencing (in drbd.conf) to
resource-and-stonith and edit your outdate-peer DRBD script to issue
fence_node and return exit status 7 as last resort action (if the other
node can't be reached) - this will also protect you from the case when just
the communication between the DRBD machines is lost

> 
>> For me a couple of minutes waiting
>> for the other node is sufficient if it was degraded already, maybe a
bit
>> longer if the DRBD was sync'd before they went down.
> 
> I'm afraid I'm not clear what you mean by this.  Isn't the fact that
each 
> node cannot know the state of the other the problem?  So how can wait 
> times be varied as you describe?
> 
> 
>> I can send you config's I believe are correct from the Linbit docs of
>> using DRBD Primary/Primary with GFS, if you like.
> 
> Something more than http://www.drbd.org/users-guide/s-gfs-create-
> resource.html ?  That would be welcome.
> 
>> 
>> But I'm told (from a thread I posted at DRBD) that this should always
>> work. 
> 
> This is something I'm realizing: that I need to ask some of my questions

> on that list rather than here, since my questions right now are more
down 
> at that layer.
> 
> Thanks...
> 	- Andrew
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster