[Linux-cluster] Re: [ddraid] extensibility

Thu May 12 19:34:57 UTC 2005

thank you for your reply.

Daniel Phillips wrote:
> Hi Jakob,
> 
> On Wednesday 11 May 2005 10:02, Jakob Praher wrote:
> 
>>I am very interested in ddraid for having sios systems over the
>>network. A few questions though:
>>
>>I've looked shortly at your implementation (ddraid-0.5.0), but have a
>>few questions, which I thought to ask you:

> You want to increase the "order" of the ddraid array?  This is not 
> supported yet.  The way to do it would be to create a new, higher order 
> ddraid array with an initial size of zero, then pvmove from the old 
> array to the new one, expanding the new array and recovering space from 
> the beginning of the old array as the move proceeds.  This will be 
> tricky to implement!  But it is possible, and in time we can expect to 
> see more sophisticated functionality like that arrive.

That sounds promising.

> 
> Adding an extra spindle to each ddraid member will be much easier.  In 
> that case, you would change each ddraid member to a linear combination 
> of two devices, by changing your dmsetup commands.
>
okay so you would use a linear /dev/mapper/.. device instead of a
physical device.

> 
>>what problems could arise given the striping information on the
>>blocks. do you simply extend a stripe from 2 to 4 data blocks?
> 
> 
> This is quite a tricky problem because you want to start using the new 
> geometry while some of the old geometry is still in use.  There is no 
> obstacle to implementing this, except a lot of work.

Yes that should be tricky. So would need 2 tables (if there is no
mapping table, but a funciton, than you need to track at least which
logical blocks are migrated), one for the new and one for the old (this
remainds me somewhat of the newspace and oldspace stuff for copying
garbage collection, where you can do it also incrementally). It would be
interesting to discuss more about the technical implications about such
an approach.

> 
> 
>>is it 
>>possible to raise the number of logical sectors with device mapper
>>(sorry for my ignorance - I haven't looked the details of the dmsetup
>>man page). do you have to rearrange block sizes or stripe
>>information?
> 
> 
> If you raise the number of logical sectors in the device mapper command, 
> then you must actually have the new capacity available.  In other 
> words, you would need to increase the partition size of each member of 
> the ddraid array as well.  CLVM is supposed to be able to do such 
> things automatically, but ddraid has not been integrated with CLVM yet.
> 
> 
> I do not have that paper.  Is it freely available online?  I see from 
> the abstract that "the RAID-x architecture is based on an orthogonal 
> striping and mirroring (OSM) scheme".  The answer is: you probably can 
> implement this in device mapper.  But why?

Sure. Sorry for my ignorance:
- Presentation -
http://csce.uark.edu/~aapon/courses/ioparallel/presentations/Raid-x.ppt
http://www.cs.plu.edu/courses/csce480/arts/distributedraid.pdf

I want to get a better system understanding. Thus the question.
And one of my biggest interests is dynamical extensibility. Having the
ablity to add more storage if its needed. Here the RAID-x is
interesting, since it scales linearly - you can simply add an additional
node, without the need to follow a special formula. (like the 2^k+1) rule.

And since the backup information isn't stored in parity form, but in
plain form, it can be also used for load balancing and fast error
recovery. (chained declustering) - the orthogonal storage algorithm
stores all the blocks for one stripe on one mirror drive.

Sure it eats more physical space. But on the other hand the costs of the
gigabytes for sata aren't that much compared to the huge amount of money
you have to buy for a sophisticated storage system.

> 
>>The design is basically a mixture of mirroring and striping, making it
>>possible for one node to fail completly.
> 
> 
> DDRaid does this without using either mirroring or striping.  It sounds 
> like Raid-x is fairly wasteful of disk space and IO bandwidth, compared 
> to ddraid.

The following is for me a path to understand the differences.

Aha. So I've read over the raid3.5 paper, but haven't analized what the
ddraid driver implements in this regard.

You have an explicit drive full of parity, and don't distribute the
parity information accross the other drives.

Raid 3.5 splits a logical unit (or block) into 2^K drives (1 drive is
full of parity information).

If one fails, it is either:
a) the paritiy one - where parity can be rebuild
b) a data one, in which case the data can be extracted from the parity
and the other data.

These are all nice properties.

One problem is that the funciton N(k) = 2^k+1 grows pretty sooon to
quite large numbers:

3,5,9,17,33,65,129,...

So it would be problematic to plug in additional nodes. I mean you can't
grow the array linearly. To reach the next array size you've to always
nearly double the last size. N_1 = N_0 + N_0 - 1 = 2(N_0) - 1

> 
> 
>>On stripe is made of (n-1) blocks and you have (n-1) mirror blocks.
>>They have implemented as part of their trojan cluster project, wich I
>>think isn't alive any more.
> 
> What is "n" in this description?
> 
n is the number of disks. (so N in your paper)
so you have the same amount of mirror blocks as you have stripe block,
which is somewhat orthogonal to itself. The advantage is that all of the
mirror blocks of one stripe set go are stored on one device, which makes
mirroring easy to implement.

--Jakob