[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Proper alignment between disk HW blocks, mdadm strides, and ext[23] blocks

On Fri, 9 Nov 2007, Andreas Dilger wrote:

On Nov 09, 2007  19:11 -0700, Chris Worley wrote:
How do you measure/gauge/assure proper alignment?

The physical disk has a block structure.  What is it or how do you
find it?  I'm guessing it's best to not partition disks in order to
assure that whatever it's block read/write is isn't bisected by the

For Lustre we never partition the disks for exactly this reason, and if
you are using LVM/md on the whole device it doesn't make sense either.

Then, mdadm has some block structure.  The "-c" ("chunk") is in
"kibibytes" (feed the dog kibbles?), with a default of 64.  Not a clue
what they're trying to do.

That just means for RAID 0/5/6 that the amount of data or parity in a
stripe is a multipe of the chunk size, i.e. for a 4+1 RAID5 you get:

	disk0 disk1 disk2 disk3 disk4

Finally, mkfs.ext[23] has a "stride", which is defined as a "stripe
size" in the man page (and I thought all your stripes added together
are a "stride"), as well as a block size.

For ext2/3/4 the stride size (in kB) == the mdadm chunk size.  Note that
the ext2/3/4 stride size is in units of filesystem blocks, so if you have
4kB filesystem blocks (default for filesystems > 500MB) and a 64kB RAID5
chunk size, this is 16:

	e2fsck -E stride=16 /dev/md0

It's important to make sure these all align properly, but their definitions

... do not?

Could somebody please clarify... with an example?

Yes, I constantly wish the terminology were constant between different tools,
but sadly there isn't any "proper" terminology out there as far as I've been
able to see.

Cheers, Andreas
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Ext3-users mailing list
Ext3-users redhat com

Quick question Andreas, if you do not provide a -E stride=16 on a RAID5 filesystem, how much worse does the performance become on say a 2.0 or 5.0TB ext3 filesystem?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]