[Linux-cluster] DRBD with GFS applicable for this scenario?

Thu Jan 28 10:44:08 UTC 2010

Zaeem Arshad wrote:

>>> We have 2 geographically distant sites located approximately 35km
>>> apart with dark fiber connectivity available between them. Mail01 and
>>> SAN1 is placed at site A while Mail02 and SAN2 is at site B. Our
>>> requirement is to have the mail servers in a cluster configuration in
>>> an active/active mode. To cater for the loss of connectivity or losing
>>> a SAN itself, I have come up with the following design.
>> What's the point of having a SAN if you're using DRBD? You might as well
>> have DAS in each of the two mail servers. Unless you need so much storage
>> space that you can't put enough disks directly into the server...
> 
> We have already bought the SAN, that's why. We do expect our storage
> needs to be on the higher side. Also, I am sharing storage on both
> SANs as theoretically DRBD will use the local resource first for
> read/write and failover to the IP reachable disk volume.

Hmm... What sort of ping time do you get? I presume you have established 
that it is on the sensible side.

In terms of performance you will need to make sure that machines tend to 
access only their own sub-paths on the file system (e.g. spool/1 and 
spool/2, and server 1 doesn't touch spool/2 until server 2 goes down). 
Otherwise the performance is going to be attrocious since file locks 
will end up bouncing between the machines. These normally live in cache 
on a conventional file system so if they have to start getting exchanged 
at most accesses you are looking at a latency degradation from ~ 50ns 
down to some milliseconds. If your connectivity is VERY good, if it's 
35km I would be surprised if your latencies are better than 10ms, which 
you'll feel even on the disk latency, let along memory latency - we are 
talking 200,000x slower in the best case scenario.

>>> 1) Export 1 block device from each SAN to its mail server i.e. SAN1
>>> exports to Mail01
>>> 2) Use DRBD to configure a block device comprising of the 2 SAN
>>> volumes and use it as a physical volume in clvm.
>> The CLVM bit is isn't relevant per se, you don't strictly need it, but it
>> won't hurt.
>>
>>> 3) Create a GFS logical volume from this PV that can be used by both
>>> servers.
>> That's fine.
>>
>>> I am wondering if this is a correct design as theoretically it looks
>>> to address both node and SAN failure or connectivity loss.
>> The problem you have is that you have no way of enacting fencing if the
>> connectivity between the sites fails. If a node fails, any cluster file
>> system (GFS included) will mandate a fencing action to ensure that one of
>> the nodes gets taken down and stays down. If you have lost cross-site
>> connectivity, the nodes won't be able to fence each other, and GFS will
>> simply block until connectivity is restored and fencing succeeds. The
>> chances are that when this happens, it'll also cause a fencing shoot-out and
>> both nodes may well end up getting fenced.
>>
>> You could use some kind of cheat-fencing, say, by setting a firewall rule
>> that will prevent the nodes from re-connecting (you'd need to write your own
>> fencing agent, but that's not particularly difficult), but then you would be
>> pretty much guaranteeing a split-brain situation, where the nodes would end
>> up operating independently without any hope of ever re-synchronising.
>>
> 
> In such a case where we lose site connectivity altogether, I'd like
> the Site 2 servers to shut itself down to avoid a split-brain
> condition. Since, I am implementing clustering, won't the quorum
> server take care of this issue?

So you propose to have a quorum disk on site 2? OK, that works. The 
problem is that fencing works by one server fencing another, not itself. 
So you'll still need a reliable OOB fencing mechanism such as the one I 
described.

Gordan