stride

Thu Jun 19 11:42:44 UTC 2008

On Thu, Jun 19, 2008 at 06:21:24AM -0400, Mag Gam wrote:
> ok, in a way its like a stripe? I though when you do a stripe you put the
> metadata on number of disks too. How is that different? Is there a diagram I
> can refer to?

Yes, which is why the mke2fs man page states:

	stride=<stripe-size>
		Configure  the	filesystem  for	 a  RAID  array with
		<stripe-size> filesystem blocks per stripe.

So if the size of a stripe on each a disk is 64k, and you are using a
4k filesystem blocksize, then 64k/4k == 16, and that would be an
"ideal" stride size, in that for each successive block group, the
inode and block bitmap would increased by an offset of 16 blocks from
the beginning of the block group.

The reason for doing this is to avoid problems where the block bitmap
ends up on the same disk for every single block group.  The classic
case where this would happen is if you have a 5 disks in a RAID 5
configuration, which means with 4 disks per stripe, and 8192 blocks in
a blockgroup, then if the block bitmap is always at the same offset
from the beginning of the block group, one disk will get all of the
block bitmaps, and that ends up being a major hot spot problem for the
hard drive.

As it turns out, if you use 4 disks in a RAID 5 configuration, or 6
disks in a RAID 5 configuration, this problem doesn't arise at all,
and you don't need to use the stride option.  And in most cases,
simply using a stride=1, that is actually enough to make sure that
each block and inode bitmaps will get forced onto successively
different disks.

With ext4's flex_bg enhancement, the need to specify stride option of
RAID arrays will also go away.

							- Ted