[Linux-cluster] DRBD with GFS applicable for this scenario?

Thu Jan 28 06:11:50 UTC 2010

On Thu, Jan 28, 2010 at 10:25 AM, Gordan Bobic <gordan at bobich.net> wrote:
> Zaeem Arshad wrote:
>>
>> Hi List,
>>
>> We have 2 geographically distant sites located approximately 35km
>> apart with dark fiber connectivity available between them. Mail01 and
>> SAN1 is placed at site A while Mail02 and SAN2 is at site B. Our
>> requirement is to have the mail servers in a cluster configuration in
>> an active/active mode. To cater for the loss of connectivity or losing
>> a SAN itself, I have come up with the following design.
>
> What's the point of having a SAN if you're using DRBD? You might as well
> have DAS in each of the two mail servers. Unless you need so much storage
> space that you can't put enough disks directly into the server...

We have already bought the SAN, that's why. We do expect our storage
needs to be on the higher side. Also, I am sharing storage on both
SANs as theoretically DRBD will use the local resource first for
read/write and failover to the IP reachable disk volume.

>
>> 1) Export 1 block device from each SAN to its mail server i.e. SAN1
>> exports to Mail01
>> 2) Use DRBD to configure a block device comprising of the 2 SAN
>> volumes and use it as a physical volume in clvm.
>
> The CLVM bit is isn't relevant per se, you don't strictly need it, but it
> won't hurt.
>
>> 3) Create a GFS logical volume from this PV that can be used by both
>> servers.
>
> That's fine.
>
>> I am wondering if this is a correct design as theoretically it looks
>> to address both node and SAN failure or connectivity loss.
>
> The problem you have is that you have no way of enacting fencing if the
> connectivity between the sites fails. If a node fails, any cluster file
> system (GFS included) will mandate a fencing action to ensure that one of
> the nodes gets taken down and stays down. If you have lost cross-site
> connectivity, the nodes won't be able to fence each other, and GFS will
> simply block until connectivity is restored and fencing succeeds. The
> chances are that when this happens, it'll also cause a fencing shoot-out and
> both nodes may well end up getting fenced.
>
> You could use some kind of cheat-fencing, say, by setting a firewall rule
> that will prevent the nodes from re-connecting (you'd need to write your own
> fencing agent, but that's not particularly difficult), but then you would be
> pretty much guaranteeing a split-brain situation, where the nodes would end
> up operating independently without any hope of ever re-synchronising.
>

In such a case where we lose site connectivity altogether, I'd like
the Site 2 servers to shut itself down to avoid a split-brain
condition. Since, I am implementing clustering, won't the quorum
server take care of this issue?

Regards

--
Zaeem