[Linux-cluster] RAIDing a CLVM?
Denis Medvedev
mdl at veles.ru
Wed Mar 22 11:10:30 UTC 2006
A better approach is to export not an GNBD but an iSCSI device from DRBD.
James Firth wrote:
> Patton, Matthew F, CTR, OSD-PA&E wrote:
>
>> I can't think of a way to combine (C)LVM, GFS, GNBD, and MD (software
>> RAID) and make it work unless just one of the nodes becomes the MD
>> master and then just exports it via NFS. Can it be done? Do
>> commercial options exist to pull off this trick?
>
>
> Hi,
>
> We're working on the same problem. We have tried two approaches, both
> with their own fairly serious drawbacks.
>
> Our goal was a 2-node all-in-one HA mega server, providing all office
> services from one cluster, and with no single point of failure.
>
> The first uses a raid master for each pair. Each member of the pair
> exports a disk using GNBD. The pair negotiate a master using CMAN,
> and that master assembles a RAID device using one GNBD import, plus
> one local disk, and then exports it using NFS, or in the case of GFS
> being used, exports the assembled raid device via a third GNBD export.
>
> Our trick here was each node exported it's contributory disk, using
> GNDB, by default, so long as at east one other node was active (quorum
> > 1), knowing only one master would ever be active. This significantly
> reduced complexity.
>
> Problems are:
> - GNDB instabilities cause frequent locks and crashes, especially
> busying DLM (suspected).
> - NFS export scheme also causes locks and hangs to NFS clients on
> failover *IF* a member of the pair then subsequently imports and an
> NFS client, as needed in some of our mega-server ideas.
> - NFS export is not too useful when file locking is important, e.g.
> subversion, procmail etc (yes, procmail, if your mail server is also
> your Samba homes server). You have to dell mailproc to use
> alternative mailbox locking else mailboxes get corrupted.
> - GFS on assembled device with GNDB export scheme works best, but
> still causes locks and hangs. Note also an exporting client must NOT
> import it's own exported GNBD volume, so there is no symmetry between
> the pair, and it's quite difficult to manage.
>
>
>
> Our second approach is something we've just embarked on, and so far is
> proving more successful, using DRBD. DRBD is used to create a
> mirrored pair of volumes, a bit like GNBD+MD as above.
>
> The result is a block device accessible from both machines, but the
> problem is that only one member of the pair is writable (master), and
> the other is a read-only mount.
>
> If the master server dies, the remaining DRBD becomes the master, and
> becomes writable. When the dead node recovers, the recovered node
> becomes a slave, read-only.
>
> The problem is with the read-only aspect, so you still need to have an
> exporting mechanism for the assembled DRBD volume running on the DRBD
> master. We plan to do this via GNBD export (GFS FS installed).
>
> That's where the complexity comes in - as the DRBD negotiation appears
> to be totally independent of cluster control suite, and so we're
> having to use customizations to start the exporting daemon on the DRBD
> master.
>
>
> Conclusions
> ---
>
> From all we've learned to date, it still seems a dedicated file server
> or SAN approach is necessary to maintain availability.
>
> Either of the above schemes would work fairly well if we were just
> building a HA storage component, because most of the complexities
> we've encountered come about when the shared storage device is used by
> services on the same cluster nodes.
>
> Most, if not all of what we've done so far is not suitable for a
> production environment, as it just increases the coupling between
> nodes, and therefore increases the chance of a cascade failure of the
> cluster. In all seriousness I believe a single machine with RAID-1
> pair has a higher MTBF than any of our experiments.
>
> Many parts of the CCS/GFS suite so far released have serious issues
> when used in non-standard configurations. For example, exception
> handling we've encountered usually defaults to "while (1) { retry();
> sleep(1); }"
>
> I've read last year about plans for GFS mirroring from RedHat, and
> haven't found much else since. If anyone knows more I'd love to hear.
>
> It also appears that the guys behind DRBD want to further develop
> their mirroring so that both volumes can be writable, in which case
> you can just stick GFS on the assembled device, and run whichever
> exporting method you like as a normal cluster service.
>
>
>
> James
>
> www.daltonfirth.co.uk
>
>
>
>
>
>
>
>
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
More information about the Linux-cluster
mailing list