[Linux-cluster] RAIDing a CLVM?

Wed Mar 22 19:18:56 UTC 2006

On Wed, Mar 22, 2006 at 02:10:30PM +0300, Denis Medvedev wrote:
> 
> A better approach is to export not an GNBD but an iSCSI device from DRBD.
> 

I would definitely go with DRBD for this setup. If I understand this setup
correctly, there is a data corruption possibility.

If you have two machines doing raid1 over a local device and a gnbd device,
you have the problem were if machine A dies after it has written to it's local
disk but not the disk on machine B. The mirror is out of sync. GNBD doesn't
do anything to help with that, and md on machine B doesn't know anything about
the state of machine A, so it can't correct the problem. So you are left with
an out of sync mirror, which is BAD. DRBD was made for exactly this setup,
and will (I believe) automagically handle this correctly.

-Ben

> James Firth wrote:
> 
> 
> >Patton, Matthew F, CTR, OSD-PA&E wrote:
> >
> >>I can't think of a way to combine (C)LVM, GFS, GNBD, and MD (software 
> >>RAID) and make it work unless just one of the nodes becomes the MD 
> >>master and then just exports it via NFS. Can it be done? Do 
> >>commercial options exist to pull off this trick?
> >
> >
> >Hi,
> >
> >We're working on the same problem. We have tried two approaches, both 
> >with their own fairly serious drawbacks.
> >
> >Our goal was a 2-node all-in-one HA mega server, providing all office 
> >services from one cluster, and with no single point of failure.
> >
> >The first uses a raid master for each pair.  Each member of the pair 
> >exports a disk using GNBD.  The pair negotiate a master using CMAN, 
> >and that master assembles a RAID device using one GNBD import, plus 
> >one local disk, and then exports it using NFS, or in the case of GFS 
> >being used, exports the assembled raid device via a third GNBD export.
> >
> >Our trick here was each node exported it's contributory disk, using 
> >GNDB, by default, so long as at east one other node was active (quorum 
> >> 1), knowing only one master would ever be active. This significantly 
> >reduced complexity.
> >
> >Problems are:
> > - GNDB instabilities cause frequent locks and crashes, especially 
> >busying DLM (suspected).
> > - NFS export scheme also causes locks and hangs to NFS clients on 
> >failover *IF* a member of the pair then subsequently imports and an 
> >NFS client, as needed in some of our mega-server ideas.
> > - NFS export is not too useful when file locking is important, e.g. 
> >subversion, procmail etc (yes, procmail, if your mail server is also 
> >your Samba homes server).  You have to dell mailproc to use 
> >alternative mailbox locking else mailboxes get corrupted.
> > - GFS on assembled device with GNDB export scheme works best, but 
> >still causes locks and hangs.  Note also an exporting client must NOT 
> >import it's own exported GNBD volume, so there is no symmetry between 
> >the pair, and it's quite difficult to manage.
> >
> >
> >
> >Our second approach is something we've just embarked on, and so far is 
> >proving more successful, using DRBD.  DRBD is used to create a 
> >mirrored pair of volumes, a bit like GNBD+MD as above.
> >
> >The result is a block device accessible from both machines, but the 
> >problem is that only one member of the pair is writable (master), and 
> >the other is a read-only mount.
> >
> >If the master server dies, the remaining DRBD becomes the master, and 
> >becomes writable.  When the dead node recovers, the recovered node 
> >becomes a slave, read-only.
> >
> >The problem is with the read-only aspect, so you still need to have an 
> >exporting mechanism for the assembled DRBD volume running on the DRBD 
> >master.  We plan to do this via GNBD export (GFS FS installed).
> >
> >That's where the complexity comes in - as the DRBD negotiation appears 
> >to be totally independent of cluster control suite, and so we're 
> >having to use customizations to start the exporting daemon on the DRBD 
> >master.
> >
> >
> >Conclusions
> >---
> >
> >From all we've learned to date, it still seems a dedicated file server 
> >or SAN approach is necessary to maintain availability.
> >
> >Either of the above schemes would work fairly well if we were just 
> >building a HA storage component, because most of the complexities 
> >we've encountered come about when the shared storage device is used by 
> >services on the same cluster nodes.
> >
> >Most, if not all of what we've done so far is not suitable for a 
> >production environment, as it just increases the coupling between 
> >nodes, and therefore increases the chance of a cascade failure of the 
> >cluster.  In all seriousness I believe a single machine with RAID-1 
> >pair has a higher MTBF than any of our experiments.
> >
> >Many parts of the CCS/GFS suite so far released have serious issues 
> >when used in non-standard configurations.  For example, exception 
> >handling we've encountered usually defaults to "while (1) { retry(); 
> >sleep(1); }"
> >
> >I've read last year about plans for GFS mirroring from RedHat, and 
> >haven't found much else since.  If anyone knows more I'd love to hear.
> >
> >It also appears that the guys behind DRBD want to further develop 
> >their mirroring so that both volumes can be writable, in which case 
> >you can just stick GFS on the assembled device, and run whichever 
> >exporting method you like as a normal cluster service.
> >
> >
> >
> >James
> >
> >www.daltonfirth.co.uk
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >-- 
> >Linux-cluster mailing list
> >Linux-cluster at redhat.com
> >https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster