[Linux-cluster] DDRAID vs. GNBD+MD

Mon Jan 28 17:03:06 UTC 2008

On Sun, 2008-01-27 at 19:56 +0000, Gordan Bobic wrote:
> Hi,
> 
> I mentioned DDRAID in a different thread recently, but it got me 
> thinking. Is there any reason why a similar solution based on GNBD plus 
> standard software RAID on top wouldn't work?

> Say we have 7 nodes, and we want 5+2 (RAID6 redundancy). Have each node 
> export a GNBD, and then all nodes connect the said GNBDs together using 
> software RAID into a /dev/md? device, and then have GFS on top of that. 
> We could then lose any 2 out of the 7 nodes and still maintain 
> operational status.

There's no way to ensure things like write ordering, cache coherency,
etc. with MD across multiple computers.

You could assemble an MD set and export it via NFS to other nodes in the
cluster, but it seems pretty complicated to save a few dollars.

> Would that work, or would the underlying GNBDs end up being temporarily 
> out of sync sufficiently for the RAID to not be assemblable on the other 
> nodes?

It's not GNBD that you have to worry about - it's just a generic,
shared-block-device setup similar to iSCSI, but less complex.

For example, you can run GFS on top of GNBD - because GFS has some
amount awareness of its own I/Os, and the server handles synchronizing
access to the actual block device.

However, write ordering between individual block devices isn't
guaranteed without something to help out.  MD doesn't do this in any
sane way for cluster use.

Suppose block A is striped across the 6 nodes.  You could have two nodes
who end up doing this:

  node 1: write block A to device 1
  node 1: write block A to device 2
  node 1: write block A to device 3

  node 2: write block A to device 1
  node 2: write block A to device 2
  node 2: write block A to device 3
  node 2: write block A to device 4
  node 2: write block A to device 5
  node 2: write block A to device 6

  node 1: write block A to device 4
  node 1: write block A to device 5
  node 1: write block A to device 6

At this point, devices {1,2,3} block A have data from node 1.
Devices {4,5,6} block A have data from node 2.

That block is now inconsistent (and unrecoverable).

> On a related node, was DDRAID ever stabilised

In fact, it's been removed from CVS:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/ddraid/Attic/?cvsroot=cluster

-- Lon