From dshaw at jabberwocky.com  Tue Jun  5 16:12:37 2012
From: dshaw at jabberwocky.com (David Shaw)
Date: Tue, 5 Jun 2012 12:12:37 -0400
Subject: Proper stride and stripe-width for RAID 50
Message-ID: <1A9700ED-4028-4D54-9A76-10CB70B5ABBE@jabberwocky.com>

Hello,

I've been looking around, but can't seem to find an authoritative statement on setting stride and stripe-width for RAID 50 (i.e. a RAID 0 over multiple RAID 5s)

Based on my understanding of what stride and stripe-width set, it seems to me that it should be calculated the same as it would be if there were no multiple-level RAIDing involved.  For example, given a RAID 50 made up of two 3+1 RAID 5s striped together (so 8 disks total) with a 512k chunk size and 4k block size, the stride should be 128 (512 / 4) and the stripe-width should be 768 (stride * 6 data disks).  These settings should work equally well whether the two RAID 5s are striped together or just appended one after the other via LVM.

Is my reasoning correct?

David


From adilger at dilger.ca  Tue Jun  5 16:18:35 2012
From: adilger at dilger.ca (Andreas Dilger)
Date: Tue, 5 Jun 2012 10:18:35 -0600
Subject: Proper stride and stripe-width for RAID 50
In-Reply-To: <1A9700ED-4028-4D54-9A76-10CB70B5ABBE@jabberwocky.com>
References: <1A9700ED-4028-4D54-9A76-10CB70B5ABBE@jabberwocky.com>
Message-ID: <CDF485A1-FAD0-4191-8BC4-AD7FC0AF6695@dilger.ca>

On 2012-06-05, at 10:12 AM, David Shaw wrote:
> I've been looking around, but can't seem to find an authoritative statement on setting stride and stripe-width for RAID 50 (i.e. a RAID 0 over multiple RAID 5s)
> 
> Based on my understanding of what stride and stripe-width set, it seems to me that it should be calculated the same as it would be if there were no multiple-level RAIDing involved.  For example, given a RAID 50 made up of two 3+1 RAID 5s striped together (so 8 disks total) with a 512k chunk size and 4k block size, the stride should be 128 (512 / 4) and the stripe-width should be 768 (stride * 6 data disks).

Strictly speaking, you only need a stripe-width of 384 (stride * 3 data
disks) since this is the minimum read-modify-write boundary.

That said, why not configure your system with RAID-6 6+2?  This gives
better fault tolerance than RAID-5.

>  These settings should work equally well whether the two RAID 5s are striped together or just appended one after the other via LVM.
> 
> Is my reasoning correct?
> 
> David
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


Cheers, Andreas


From dshaw at jabberwocky.com  Tue Jun  5 18:33:34 2012
From: dshaw at jabberwocky.com (David Shaw)
Date: Tue, 5 Jun 2012 14:33:34 -0400
Subject: Proper stride and stripe-width for RAID 50
In-Reply-To: <CDF485A1-FAD0-4191-8BC4-AD7FC0AF6695@dilger.ca>
References: <1A9700ED-4028-4D54-9A76-10CB70B5ABBE@jabberwocky.com>
	<CDF485A1-FAD0-4191-8BC4-AD7FC0AF6695@dilger.ca>
Message-ID: <8D4E570B-F12F-4F81-AF05-AEDA6FD6D820@jabberwocky.com>

On Jun 5, 2012, at 12:18 PM, Andreas Dilger wrote:

> On 2012-06-05, at 10:12 AM, David Shaw wrote:
>> I've been looking around, but can't seem to find an authoritative statement on setting stride and stripe-width for RAID 50 (i.e. a RAID 0 over multiple RAID 5s)
>> 
>> Based on my understanding of what stride and stripe-width set, it seems to me that it should be calculated the same as it would be if there were no multiple-level RAIDing involved.  For example, given a RAID 50 made up of two 3+1 RAID 5s striped together (so 8 disks total) with a 512k chunk size and 4k block size, the stride should be 128 (512 / 4) and the stripe-width should be 768 (stride * 6 data disks).
> 
> Strictly speaking, you only need a stripe-width of 384 (stride * 3 data
> disks) since this is the minimum read-modify-write boundary.

Ah, right.  Thanks, I was over-thinking it.

> That said, why not configure your system with RAID-6 6+2?  This gives
> better fault tolerance than RAID-5.

Yes.  I was using two RAID5 3+1s as a simple example.  One of the systems I'm working on actually has 24 drives, and I thought four RAID5 5+1s might perform better than two RAID6 10+2s (I'm going to actually test that of course).

David


From jeremy at jeremysanders.net  Thu Jun 14 09:11:16 2012
From: jeremy at jeremysanders.net (Jeremy Sanders)
Date: Thu, 14 Jun 2012 10:11:16 +0100
Subject: Filesystem is busy after umount and won't fsck
Message-ID: <4FD9AAB4.1020203@jeremysanders.net>

This is on Fedora 16, kernel 3.3.7-1.fc16.x86_64. sdb1/3 are ext3
devices on a Hitachi HDS721010KLA330.

# fsck /dev/sdb1
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext3: Device or resource busy while trying to open /dev/sdb1
Filesystem mounted or opened exclusively by another program?
# fsck /dev/sdb3
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext3: Device or resource busy while trying to open /dev/sdb3
Filesystem mounted or opened exclusively by another program?
# umount /dev/sdb1
umount: /dev/sdb1: not mounted
# umount /dev/sdb3
umount: /dev/sdb3: not mounted
# grep sdb /proc/mounts

# mount /dev/sdb1 /mnt/tmp
# umount /mnt/tmp
# fsck /dev/sdb1
fsck from util-linux 2.20.1
e2fsck 1.41.14 (22-Dec-2010)
fsck.ext3: Device or resource busy while trying to open /dev/sdb1
Filesystem mounted or opened exclusively by another program?

# lsof|grep sdb
jbd2/sdb1  3835          root  cwd       DIR                8,1     4096
         2 /
jbd2/sdb1  3835          root  rtd       DIR                8,1
4096          2 /
jbd2/sdb1  3835          root  txt   unknown
          /proc/3835/exe
jbd2/sdb3  3858          root  cwd       DIR                8,1
4096          2 /
jbd2/sdb3  3858          root  rtd       DIR                8,1
4096          2 /
jbd2/sdb3  3858          root  txt unknown

/proc/3858/exe

Any idea what is going on? The device is usually fscked, then mounted
nightly, data is rsynced onto it, then unmounted. We'll probably upgrade
it to ext4 soon to see whether the problem goes away.

This has happened twice on this kernel (rebooted after it happened the
first time) and never before.

This is dumpe2fs on /dev/sdb1:
Filesystem volume name:   <none>
Last mounted on:          /mnt/root_backup
Filesystem UUID:          2aeb7822-f81a-4b84-a9fd-76c4c5c27bba
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1835008
Block count:              3664820
Reserved block count:     183241
Free blocks:              1237712
Free inodes:              1545330
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      894
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Filesystem created:       Thu May 22 15:09:53 2008
Last mount time:          Wed Jun 13 03:47:45 2012
Last write time:          Wed Jun 13 03:47:45 2012
Mount count:              5
Maximum mount count:      28
Last checked:             Sat Jun  9 09:37:27 2012
Check interval:           15552000 (6 months)
Next check after:         Thu Dec  6 08:37:27 2012
Lifetime writes:          87 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      706df78a-3ca8-46e8-a831-5f8ba43d2609
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x0011fbd9
Journal start:            1


Thanks

Jeremy