[Linux-cluster] RAIDing a CLVM?
Benjamin Marzinski
bmarzins at redhat.com
Thu Mar 23 03:34:27 UTC 2006
On Wed, Mar 22, 2006 at 01:18:56PM -0600, Benjamin Marzinski wrote:
> On Wed, Mar 22, 2006 at 02:10:30PM +0300, Denis Medvedev wrote:
> >
> > A better approach is to export not an GNBD but an iSCSI device from DRBD.
> >
>
> I would definitely go with DRBD for this setup. If I understand this setup
> correctly, there is a data corruption possibility.
>
> If you have two machines doing raid1 over a local device and a gnbd device,
> you have the problem were if machine A dies after it has written to it's local
> disk but not the disk on machine B. The mirror is out of sync. GNBD doesn't
> do anything to help with that, and md on machine B doesn't know anything about
> the state of machine A, so it can't correct the problem. So you are left with
> an out of sync mirror, which is BAD. DRBD was made for exactly this setup,
> and will (I believe) automagically handle this correctly.
This is ignoring the obvious issue that after machine A is dead, B will
presumeably keep writing to it's device, so it will obviously be out of sync.
And you probably knew that. It's been a long week. But still, this sounds
exactly like what DRBD was designed for.
-Ben
> -Ben
>
> > James Firth wrote:
> >
> >
> > >Patton, Matthew F, CTR, OSD-PA&E wrote:
> > >
> > >>I can't think of a way to combine (C)LVM, GFS, GNBD, and MD (software
> > >>RAID) and make it work unless just one of the nodes becomes the MD
> > >>master and then just exports it via NFS. Can it be done? Do
> > >>commercial options exist to pull off this trick?
> > >
> > >
> > >Hi,
> > >
> > >We're working on the same problem. We have tried two approaches, both
> > >with their own fairly serious drawbacks.
> > >
> > >Our goal was a 2-node all-in-one HA mega server, providing all office
> > >services from one cluster, and with no single point of failure.
> > >
> > >The first uses a raid master for each pair. Each member of the pair
> > >exports a disk using GNBD. The pair negotiate a master using CMAN,
> > >and that master assembles a RAID device using one GNBD import, plus
> > >one local disk, and then exports it using NFS, or in the case of GFS
> > >being used, exports the assembled raid device via a third GNBD export.
> > >
> > >Our trick here was each node exported it's contributory disk, using
> > >GNDB, by default, so long as at east one other node was active (quorum
> > >> 1), knowing only one master would ever be active. This significantly
> > >reduced complexity.
> > >
> > >Problems are:
> > > - GNDB instabilities cause frequent locks and crashes, especially
> > >busying DLM (suspected).
> > > - NFS export scheme also causes locks and hangs to NFS clients on
> > >failover *IF* a member of the pair then subsequently imports and an
> > >NFS client, as needed in some of our mega-server ideas.
> > > - NFS export is not too useful when file locking is important, e.g.
> > >subversion, procmail etc (yes, procmail, if your mail server is also
> > >your Samba homes server). You have to dell mailproc to use
> > >alternative mailbox locking else mailboxes get corrupted.
> > > - GFS on assembled device with GNDB export scheme works best, but
> > >still causes locks and hangs. Note also an exporting client must NOT
> > >import it's own exported GNBD volume, so there is no symmetry between
> > >the pair, and it's quite difficult to manage.
> > >
> > >
> > >
> > >Our second approach is something we've just embarked on, and so far is
> > >proving more successful, using DRBD. DRBD is used to create a
> > >mirrored pair of volumes, a bit like GNBD+MD as above.
> > >
> > >The result is a block device accessible from both machines, but the
> > >problem is that only one member of the pair is writable (master), and
> > >the other is a read-only mount.
> > >
> > >If the master server dies, the remaining DRBD becomes the master, and
> > >becomes writable. When the dead node recovers, the recovered node
> > >becomes a slave, read-only.
> > >
> > >The problem is with the read-only aspect, so you still need to have an
> > >exporting mechanism for the assembled DRBD volume running on the DRBD
> > >master. We plan to do this via GNBD export (GFS FS installed).
> > >
> > >That's where the complexity comes in - as the DRBD negotiation appears
> > >to be totally independent of cluster control suite, and so we're
> > >having to use customizations to start the exporting daemon on the DRBD
> > >master.
> > >
> > >
> > >Conclusions
> > >---
> > >
> > >From all we've learned to date, it still seems a dedicated file server
> > >or SAN approach is necessary to maintain availability.
> > >
> > >Either of the above schemes would work fairly well if we were just
> > >building a HA storage component, because most of the complexities
> > >we've encountered come about when the shared storage device is used by
> > >services on the same cluster nodes.
> > >
> > >Most, if not all of what we've done so far is not suitable for a
> > >production environment, as it just increases the coupling between
> > >nodes, and therefore increases the chance of a cascade failure of the
> > >cluster. In all seriousness I believe a single machine with RAID-1
> > >pair has a higher MTBF than any of our experiments.
> > >
> > >Many parts of the CCS/GFS suite so far released have serious issues
> > >when used in non-standard configurations. For example, exception
> > >handling we've encountered usually defaults to "while (1) { retry();
> > >sleep(1); }"
> > >
> > >I've read last year about plans for GFS mirroring from RedHat, and
> > >haven't found much else since. If anyone knows more I'd love to hear.
> > >
> > >It also appears that the guys behind DRBD want to further develop
> > >their mirroring so that both volumes can be writable, in which case
> > >you can just stick GFS on the assembled device, and run whichever
> > >exporting method you like as a normal cluster service.
> > >
> > >
> > >
> > >James
> > >
> > >www.daltonfirth.co.uk
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >--
> > >Linux-cluster mailing list
> > >Linux-cluster at redhat.com
> > >https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list