[Linux-cluster] GFS+DRBD+Quorum: Help wrap my brain around this
ag8817282 at gideon.org
Mon Nov 29 21:40:42 UTC 2010
On Fri, 26 Nov 2010 15:04:40 +0000, Colin Simpson wrote:
>> but when I break the DRBD connection between two primary nodes,
>> "disconnected" apparently means that the nodes both continue as if
>> they've UpToDate disks. But this lets the data go out of sync. Isn't
>> this a Bad Thing?
> Yup that could be an issue, however you should never be in a situation
> where you break the connection between the two nodes. This needs to be
> heavily mitigated, I'm planning to bond two interfaces on two different
> cards so this doesn't happen (or I should say is highly unlikely).
Since I'll be a person tasked with cleaning up from this situation, and
given that I've no idea how to achieve that cleanup once writes are
occurring on both sides independently, I think I'll want something more
than "highly unlikely". That's rather the point of these tools, isn't it?
> 2/ The node goes down totally so DRBD loses comms. But as all the comms
> are down the other node will notice and Cluster Suite will fence the bad
> node. Remember that GFS will suspend all operations (on all nodes) until
> the bad node is fenced.
Does it make sense to have Cluster Suite do this fencing, or should DRBD
do it? I'm thinking that DRBD's resource-and-stonith gets me pretty
> I plan to further help the first situation by having my cluster comms
> share the same bond with the DRBD. So if the comms fail, cluster suite
> should notice, both the DRBD's on each node shouldn't change as GFS will
> have suspended operations. Assuming the fence devices are reachable then
> one of the nodes should fence the other (it might be a bit of a shoot
> out situation) and then GFS should resume on the remaining node.
This "shoot out situation" (race condition) is part of my worry. A third
voter of any form eliminates this, in that it can arbitrate the matter of
which of the two nodes in a lost-comm situation should be "outdated" and
And if the third voter can solve the "wait forever on startup", so much
I'm looking at how to solve this all at the DRBD layer. But I'm also
interested in a more Cluster-Suite-centric solution. I could use a
quorum disk, but a third node would also be useful. I haven't figured
out, though, how to run clvmd with the shared storage available on only
two of three cluster nodes. Is there a way to do this?
More information about the Linux-cluster