[Linux-cluster] RAIDing a CLVM?

Wed Mar 22 11:10:30 UTC 2006

A better approach is to export not an GNBD but an iSCSI device from DRBD.

James Firth wrote:

> Patton, Matthew F, CTR, OSD-PA&E wrote:
>
>> I can't think of a way to combine (C)LVM, GFS, GNBD, and MD (software 
>> RAID) and make it work unless just one of the nodes becomes the MD 
>> master and then just exports it via NFS. Can it be done? Do 
>> commercial options exist to pull off this trick?
>
>
> Hi,
>
> We're working on the same problem. We have tried two approaches, both 
> with their own fairly serious drawbacks.
>
> Our goal was a 2-node all-in-one HA mega server, providing all office 
> services from one cluster, and with no single point of failure.
>
> The first uses a raid master for each pair.  Each member of the pair 
> exports a disk using GNBD.  The pair negotiate a master using CMAN, 
> and that master assembles a RAID device using one GNBD import, plus 
> one local disk, and then exports it using NFS, or in the case of GFS 
> being used, exports the assembled raid device via a third GNBD export.
>
> Our trick here was each node exported it's contributory disk, using 
> GNDB, by default, so long as at east one other node was active (quorum 
> > 1), knowing only one master would ever be active. This significantly 
> reduced complexity.
>
> Problems are:
>  - GNDB instabilities cause frequent locks and crashes, especially 
> busying DLM (suspected).
>  - NFS export scheme also causes locks and hangs to NFS clients on 
> failover *IF* a member of the pair then subsequently imports and an 
> NFS client, as needed in some of our mega-server ideas.
>  - NFS export is not too useful when file locking is important, e.g. 
> subversion, procmail etc (yes, procmail, if your mail server is also 
> your Samba homes server).  You have to dell mailproc to use 
> alternative mailbox locking else mailboxes get corrupted.
>  - GFS on assembled device with GNDB export scheme works best, but 
> still causes locks and hangs.  Note also an exporting client must NOT 
> import it's own exported GNBD volume, so there is no symmetry between 
> the pair, and it's quite difficult to manage.
>
>
>
> Our second approach is something we've just embarked on, and so far is 
> proving more successful, using DRBD.  DRBD is used to create a 
> mirrored pair of volumes, a bit like GNBD+MD as above.
>
> The result is a block device accessible from both machines, but the 
> problem is that only one member of the pair is writable (master), and 
> the other is a read-only mount.
>
> If the master server dies, the remaining DRBD becomes the master, and 
> becomes writable.  When the dead node recovers, the recovered node 
> becomes a slave, read-only.
>
> The problem is with the read-only aspect, so you still need to have an 
> exporting mechanism for the assembled DRBD volume running on the DRBD 
> master.  We plan to do this via GNBD export (GFS FS installed).
>
> That's where the complexity comes in - as the DRBD negotiation appears 
> to be totally independent of cluster control suite, and so we're 
> having to use customizations to start the exporting daemon on the DRBD 
> master.
>
>
> Conclusions
> ---
>
> From all we've learned to date, it still seems a dedicated file server 
> or SAN approach is necessary to maintain availability.
>
> Either of the above schemes would work fairly well if we were just 
> building a HA storage component, because most of the complexities 
> we've encountered come about when the shared storage device is used by 
> services on the same cluster nodes.
>
> Most, if not all of what we've done so far is not suitable for a 
> production environment, as it just increases the coupling between 
> nodes, and therefore increases the chance of a cascade failure of the 
> cluster.  In all seriousness I believe a single machine with RAID-1 
> pair has a higher MTBF than any of our experiments.
>
> Many parts of the CCS/GFS suite so far released have serious issues 
> when used in non-standard configurations.  For example, exception 
> handling we've encountered usually defaults to "while (1) { retry(); 
> sleep(1); }"
>
> I've read last year about plans for GFS mirroring from RedHat, and 
> haven't found much else since.  If anyone knows more I'd love to hear.
>
> It also appears that the guys behind DRBD want to further develop 
> their mirroring so that both volumes can be writable, in which case 
> you can just stick GFS on the assembled device, and run whichever 
> exporting method you like as a normal cluster service.
>
>
>
> James
>
> www.daltonfirth.co.uk
>
>
>
>
>
>
>
>
>
>
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>