[dm-devel] [PATCH v2] DM RAID: Add support for MD RAID10

NeilBrown neilb at suse.de
Tue Jul 17 02:29:14 UTC 2012


On Mon, 16 Jul 2012 17:53:53 -0500 Brassow Jonathan <jbrassow at redhat.com>
wrote:

> 
> On Jul 16, 2012, at 3:28 AM, keld at keldix.com wrote:
> 
> >> 
> >> Maybe you are suggesting that dmraid should not support raid10-far or
> >> raid10-offset until the "new" approach is implemented.
> > 
> > I don't know. It may take a while to get it implemented as long as no seasoned 
> > kernel hackers are working on it. As it is implemented now by Barrow, why not then go
> > forward as planned. 
> > 
> > For the offset layout I don't have a good idea on how to improve the redundancy.
> > Maybe you or others have good ideas. Or is the offset layout an implementation
> > of a standard layout? Then there is not much ado. Except if we could find a layout that has
> > the same advantages but with better redundancy.
> 
> Excuse me, s/Barrow/Brassow/ - my parents insist.
> 
> I've got a "simple" idea for improving the redundancy of the "far" algorithms.  Right now, when calculating the device on which the far copy will go, we perform:
> 	d += geo->near_copies;
> 	d %= geo->raid_disks;
> This effectively "shifts" the copy rows over by 'near_copies' (1 in the simple case), as follows:
> 	disk1	disk2	or	disk1	disk2	disk3
> 	=====	=====		=====	=====	=====
> 	 A1	 A2		 A1	 A2	 A3
> 	 ..	 ..		 ..	 ..	 ..
> 	 A2	 A1		 A3	 A1	 A2
> For all odd numbers of 'far' copies, this is what we should do.  However, for an even number of far copies, we should shift "near_copies + 1" - unless (far_copies == (raid_disks / near_copies)), in which case it should be simply "near_copies".  This should provide maximum redundancy for all cases, I think.  I will call the number of devices the copy is shifted the "device stride", or dev_stride.  Here are a couple examples:
> 	2-devices, near=1, far=2, offset=0/1: dev_stride = nc (SAME AS CURRENT ALGORITHM)
> 
> 	3-devices, near=1, far=2, offset=0/1: dev_stride = nc + 1.  Layout changes as follows:
> 	disk1	disk2	disk3
> 	=====	=====	=====
> 	 A1	 A2	 A3
> 	 ..	 ..	 ..
> 	 A2	 A3	 A1
> 
> 	4-devices, near=1, far=2, offset=0/1: dev_stride = nc + 1.  Layout changes as follows:
> 	disk1	disk2	disk3	disk4
> 	=====	=====	=====   =====
> 	 A1	 A2	 A3	 A4
> 	 ..	 ..	 ..	 ..
> 	 A3	 A4	 A1	 A2

Hi Jon,
 This looks good for 4 devices, but I think it breaks down for e.g. 6 devices.

I think a useful measure is how many different pairs of devices exist such
that when both fail we lose data (thinking of far=2 layouts only).  We want to
keep this number low.  Call it the number of vulnerable pairs.

With the current layout with N devices, there are N pairs that are vulnerable.
(x and x+1 for each x).  If N==2, the two pairs are 0,1 and 1,0.  These
pairs are identical so there is only one vulnerable pair.

With your layout there are still N pairs (x and x+2) except when there are 4
devices (N=2), we get 0,2 1,3 2,0 3,1 in which case 2 sets of pairs are
identical (1,3 == 3,1 and 2,4==4,2).
With N=6 the 6 pairs are 
0,2 1,3 2,4 3,5 4,0 5,1
and no two pairs are identical.  So there is no gain.

The layout with data stored on device 'x' is mirrored on device 'x^1' has
N/2 pairs which are vulnerable. 
An alternate way to gain this low level of vulnerability would be to mirror
data on X onto 'X+N/2'  This is the same as your arrangement for N==4.
For N==6 it would be:

A  B  C  D  E  F
G  H  I  J  K  L
....
D  E  F  A  B  C
J  K  L  G  H  I
...

so the vulnerable pairs are 0,3 1,4 2,5
This might be slightly easier to implement (as you suggest: have a
dev_stride, only set it to raid_disks/fc*nc).

> 
> This should require a new bit in 'layout' (bit 17) to signify a different calculation in the way the copy device selection happens.  We then need to replace 'd += geo->near_copies' with 'd += geo->dev_stride' and set dev_stride in 'setup_geo'.  I'm not certain how much work it is beyond that, but I don't *think* it looks that bad and I'd be happy to do it.

I'm tempted to set bit 31 to mean "bits 0xFF are number of copies and bits
0xFF00 define the layout of those copies".. but just adding a bit17 probably
makes more sense.

If you create and test a patch using the calculation I suggested, I'll be
happy to review it.

> 
> So, should I allow the current "far" and "offset" in dm-raid, or should I simply allow "near" for now?

That's up to you.  However it might be sensible not to rush into supporting
the current far and offset layouts until this conversation has run its course.

Thanks,
NeilBrown

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20120717/620cab70/attachment.sig>


More information about the dm-devel mailing list