[linux-lvm] Wierd lvm2 performance problems

Mon Apr 20 14:39:47 UTC 2009

On Mon, Apr 20, 2009 at 04:14:22PM +0200, Sven Eschenberg wrote:
>Hi Luca,
>
>On Mon, April 20, 2009 15:46, Luca Berra wrote:
>> On Mon, Apr 20, 2009 at 03:15:12PM +0200, Sven Eschenberg wrote:
>>>Hi Luca,
>>>
>>>Okay, let's assume a chunk size of C. No matter what your md looks like,
>>>the logical md volume consists of a series of size/C chunks. the very
>>>first chunk C0 will hold the LVM header.
>>>If I align the extends with the chunksize and the extends even have the
>>>chunksize, then every extens PEx of my PV equals exactly a chunk on any
>>> of
>>>the disks.
>>>Which in turn means, if I want to read PEx I have to read some chunk Cy
>>> on
>>>one disk, and PEx+1 would most certainly be a Chunk Cy+1 which would
>>>reside on a different physical disk.
>>
>> correct
>>
>>>So the question is: Why would you want to align the first PE to the
>>>stripesize, rather then the chunksize?
>>
>> Because when you _write_ incomplete stripes, the raid code
>> would need to do a read-modify-write of the parity block.
>
>I didn't think of this 'yet', then again all the preliminary tests I did
>so far were on a 4D raid10 - Didn't have the time to setup the raid5
>volume yet, because the performance issues on the raid10 were so amazing
>:-D.
>
>>
>> Filesystem, like ext3/4 and xfs have the ability to account for stripe
>> size in the block allocator to prevent unnecessary read-modify-writes,
>> but if you do not stripe-align the start of the filesystem you cannot
>> take advantage of this.
>>
>
>Since you mentioned it: What is the specific option (for xfs mainly) to
>modify this behavior?
-d sunit=n (chunk size in 512b blocks)
-d swidth=n (stripe size in 512b blocks)
or, more convenient
-d su=n (chunk size in bytes)
-d sw=n (stripe size in bites

eg. mkfs.xfs -d su=64k,sw=192k ....
for a 3+1 raid5 with default chunksize

>> The annoying issue is that rarely you have a (n^2)+P array, and pe_size
>> must be a power of 2.
>> So for example, given my 3D1P raid5 the only solution I devised was
>> having a chunk size which is a power of 2k, pe_start is aligned to
>> stripe, pe_size = chunk size, and I have to remember that every time I
>> extend a LV it has to be extended to the nearest multiple of 3 LE.
>
>Ouch, I see, I'm gonna be as lucky as you :-).
>
>Another question arose, when I thought about something: I actually wanted
>to place the OS on a stripe of mirrors, since this gives me the
>statistically best robustness against two failing disks. From what I could
>read in the md man page, non of the offered raid10 modes provides such a
>layout. Would I have to first mirror two drives with md and then stripe em
>together with md on top of md?

i believe raid10 to be smart enough, but i am not 100% confident,
you could ask on linux-raid ml.
stacking raid devices would be an alternative

L.

-- 
Luca Berra -- bluca at comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \