[Linux-cluster] GNBD+RAID5

Wed Dec 1 15:43:42 UTC 2004

On Tue, Nov 30, 2004 at 11:48:52PM -0500, Shih-Che Huang wrote:
> Hi Shih-Che,
> 
> >#gfs_mkfs -p lock_gulm -t alpha:gfstest -j 3 /dev/pool/storage
> >#mount -t gfs /dev/pool/storage /gfs1
> Instead of making one big pool of 70GB make three pools
> /dev/pool/storage1,2, 3. and then you can mount them onto /gfs1,2,3.

> You can then use Linux MD (Meta Devices) driver to create raid-5 array
> (software raid).
I'm not sure what you're suggesting here, but I don't think it will work - at all.

Linux MD runs on top of block devices, not filesystems.  Therefore to do anything on top of the GFS, you'd be looking at using the loopback device to mount files under the filesystems.

Whilst this will give you block devices on shared storage, it won't help at all with respect to data integrity & management, since you'll still have the underlying problem with the lack of support for clustering in the MD driver.

To give you an example - as soon as you attempt to start the MD on the second node, it's going to notice (from the metainfo stored within it) that the device is still open already.  This could easily start a RAID rebuild - which is going to hammer your IO.

The 1st node is likely to notice that the rebuild has started, and since it doesn't see any reason for it, is likely to panic or at least complain that the metainfo has been written to by someone else.  I don't know enough about the MD driver to know if it would cope with this - and that's just starting up the 2nd node - if you add a 3rd (assuming that a rebuild happened successfully and then nodes 1 & 2 both accepted it was all happy), then you get even more problems.

Also you have to allow for the underlying GFS partition to fail, and be made available again.  As soon as you do that, you need to tell all the nodes that it is available - and they are then all going to start resyncing - so your IO is going to be massacred.

And, of course, you'd have to run GFS over the top as well, since otherwise you get filesystem caching problems.  GFS running on MD running on GFS running on GNBD - that's a recipe for disaster.

> >Could you give me some suggestion?
The latest CVS code apparently has some mirroring support via the device mapper, which is probably the best place for it *grin*, but it's early days at the moment, so it is likely that the system will change.  At the moment I don't think there's an easy way to do it.

There may be a way to do it by using GNBD directly on top of MD, and only sharing it out from one server at a time (and using something like heartbeat to shift the IP to another machine should that one fail).  That way you've only got one machine accessing the block devices at a time.

GNBD can export md devices, so this /should/ work.  However, without actually testing it, I wouldn't want to rely on it in a live system.

Graham