From dshaw at jabberwocky.com Tue Jun 5 16:12:37 2012 From: dshaw at jabberwocky.com (David Shaw) Date: Tue, 5 Jun 2012 12:12:37 -0400 Subject: Proper stride and stripe-width for RAID 50 Message-ID: <1A9700ED-4028-4D54-9A76-10CB70B5ABBE@jabberwocky.com> Hello, I've been looking around, but can't seem to find an authoritative statement on setting stride and stripe-width for RAID 50 (i.e. a RAID 0 over multiple RAID 5s) Based on my understanding of what stride and stripe-width set, it seems to me that it should be calculated the same as it would be if there were no multiple-level RAIDing involved. For example, given a RAID 50 made up of two 3+1 RAID 5s striped together (so 8 disks total) with a 512k chunk size and 4k block size, the stride should be 128 (512 / 4) and the stripe-width should be 768 (stride * 6 data disks). These settings should work equally well whether the two RAID 5s are striped together or just appended one after the other via LVM. Is my reasoning correct? David From adilger at dilger.ca Tue Jun 5 16:18:35 2012 From: adilger at dilger.ca (Andreas Dilger) Date: Tue, 5 Jun 2012 10:18:35 -0600 Subject: Proper stride and stripe-width for RAID 50 In-Reply-To: <1A9700ED-4028-4D54-9A76-10CB70B5ABBE@jabberwocky.com> References: <1A9700ED-4028-4D54-9A76-10CB70B5ABBE@jabberwocky.com> Message-ID: On 2012-06-05, at 10:12 AM, David Shaw wrote: > I've been looking around, but can't seem to find an authoritative statement on setting stride and stripe-width for RAID 50 (i.e. a RAID 0 over multiple RAID 5s) > > Based on my understanding of what stride and stripe-width set, it seems to me that it should be calculated the same as it would be if there were no multiple-level RAIDing involved. For example, given a RAID 50 made up of two 3+1 RAID 5s striped together (so 8 disks total) with a 512k chunk size and 4k block size, the stride should be 128 (512 / 4) and the stripe-width should be 768 (stride * 6 data disks). Strictly speaking, you only need a stripe-width of 384 (stride * 3 data disks) since this is the minimum read-modify-write boundary. That said, why not configure your system with RAID-6 6+2? This gives better fault tolerance than RAID-5. > These settings should work equally well whether the two RAID 5s are striped together or just appended one after the other via LVM. > > Is my reasoning correct? > > David > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users Cheers, Andreas From dshaw at jabberwocky.com Tue Jun 5 18:33:34 2012 From: dshaw at jabberwocky.com (David Shaw) Date: Tue, 5 Jun 2012 14:33:34 -0400 Subject: Proper stride and stripe-width for RAID 50 In-Reply-To: References: <1A9700ED-4028-4D54-9A76-10CB70B5ABBE@jabberwocky.com> Message-ID: <8D4E570B-F12F-4F81-AF05-AEDA6FD6D820@jabberwocky.com> On Jun 5, 2012, at 12:18 PM, Andreas Dilger wrote: > On 2012-06-05, at 10:12 AM, David Shaw wrote: >> I've been looking around, but can't seem to find an authoritative statement on setting stride and stripe-width for RAID 50 (i.e. a RAID 0 over multiple RAID 5s) >> >> Based on my understanding of what stride and stripe-width set, it seems to me that it should be calculated the same as it would be if there were no multiple-level RAIDing involved. For example, given a RAID 50 made up of two 3+1 RAID 5s striped together (so 8 disks total) with a 512k chunk size and 4k block size, the stride should be 128 (512 / 4) and the stripe-width should be 768 (stride * 6 data disks). > > Strictly speaking, you only need a stripe-width of 384 (stride * 3 data > disks) since this is the minimum read-modify-write boundary. Ah, right. Thanks, I was over-thinking it. > That said, why not configure your system with RAID-6 6+2? This gives > better fault tolerance than RAID-5. Yes. I was using two RAID5 3+1s as a simple example. One of the systems I'm working on actually has 24 drives, and I thought four RAID5 5+1s might perform better than two RAID6 10+2s (I'm going to actually test that of course). David From jeremy at jeremysanders.net Thu Jun 14 09:11:16 2012 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Thu, 14 Jun 2012 10:11:16 +0100 Subject: Filesystem is busy after umount and won't fsck Message-ID: <4FD9AAB4.1020203@jeremysanders.net> This is on Fedora 16, kernel 3.3.7-1.fc16.x86_64. sdb1/3 are ext3 devices on a Hitachi HDS721010KLA330. # fsck /dev/sdb1 fsck from util-linux 2.20.1 e2fsck 1.41.14 (22-Dec-2010) fsck.ext3: Device or resource busy while trying to open /dev/sdb1 Filesystem mounted or opened exclusively by another program? # fsck /dev/sdb3 fsck from util-linux 2.20.1 e2fsck 1.41.14 (22-Dec-2010) fsck.ext3: Device or resource busy while trying to open /dev/sdb3 Filesystem mounted or opened exclusively by another program? # umount /dev/sdb1 umount: /dev/sdb1: not mounted # umount /dev/sdb3 umount: /dev/sdb3: not mounted # grep sdb /proc/mounts # mount /dev/sdb1 /mnt/tmp # umount /mnt/tmp # fsck /dev/sdb1 fsck from util-linux 2.20.1 e2fsck 1.41.14 (22-Dec-2010) fsck.ext3: Device or resource busy while trying to open /dev/sdb1 Filesystem mounted or opened exclusively by another program? # lsof|grep sdb jbd2/sdb1 3835 root cwd DIR 8,1 4096 2 / jbd2/sdb1 3835 root rtd DIR 8,1 4096 2 / jbd2/sdb1 3835 root txt unknown /proc/3835/exe jbd2/sdb3 3858 root cwd DIR 8,1 4096 2 / jbd2/sdb3 3858 root rtd DIR 8,1 4096 2 / jbd2/sdb3 3858 root txt unknown /proc/3858/exe Any idea what is going on? The device is usually fscked, then mounted nightly, data is rsynced onto it, then unmounted. We'll probably upgrade it to ext4 soon to see whether the problem goes away. This has happened twice on this kernel (rebooted after it happened the first time) and never before. This is dumpe2fs on /dev/sdb1: Filesystem volume name: Last mounted on: /mnt/root_backup Filesystem UUID: 2aeb7822-f81a-4b84-a9fd-76c4c5c27bba Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 1835008 Block count: 3664820 Reserved block count: 183241 Free blocks: 1237712 Free inodes: 1545330 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 894 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16384 Inode blocks per group: 512 Filesystem created: Thu May 22 15:09:53 2008 Last mount time: Wed Jun 13 03:47:45 2012 Last write time: Wed Jun 13 03:47:45 2012 Mount count: 5 Maximum mount count: 28 Last checked: Sat Jun 9 09:37:27 2012 Check interval: 15552000 (6 months) Next check after: Thu Dec 6 08:37:27 2012 Lifetime writes: 87 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 706df78a-3ca8-46e8-a831-5f8ba43d2609 Journal backup: inode blocks Journal features: journal_incompat_revoke Journal size: 128M Journal length: 32768 Journal sequence: 0x0011fbd9 Journal start: 1 Thanks Jeremy