From feldmann_markus at gmx.de Fri Sep 16 13:24:15 2011 From: feldmann_markus at gmx.de (Markus Feldmann) Date: Fri, 16 Sep 2011 15:24:15 +0200 Subject: damaged partition Message-ID: Hi All, since weeks i cant access the laptop from my girl-friend. This laptop has a dual boot system (Windows and Debian Lenny). When i try to start the Linux System he stops at a specific point, where he cant find some devices under /dev/. Further i tried to boot a live-CD and mount the Linux System manualy without success. My partition-table is: Disk /dev/sda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xb1c0b1c0 Device Boot Start End Blocks Id System /dev/sda1 * 1 8032 64517008+ 7 HPFS/NTFS /dev/sda2 8415 9728 10554705 f W95 Ext'd (LBA) /dev/sda3 8033 8414 3068415 c W95 FAT32 (LBA) /dev/sda5 * 8415 8420 48163+ 83 Linux /dev/sda6 8421 8547 1020096 82 Linux swap / Solaris /dev/sda7 8548 9728 9486351 83 Linux Partition table entries are not in disk order Here are some further tests: fsck from util-linux-ng 2.17.2 e2fsck 1.41.12 (17-May-2010) /dev/sda7: clean, 232529/1186688 files, 2143683/2371587 blocks mount: Stale NFS file handle fsck from util-linux-ng 2.17.2 e2fsck 1.41.12 (17-May-2010) fsck.ext2: Bad magic number in super-block while trying to open /dev/sda7 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 I tried all backup-superblocks without success. /dev/sda7: Linux rev 1.0 ext2 filesystem data, UUID=96380d35-74fd-4c37-abc9-9dfc3d1bd43e /dev/sda7: UUID="96380d35-74fd-4c37-abc9-9dfc3d1bd43e" TYPE="ext2" e2fsck 1.41.12 (17-May-2010) /dev/sda7: clean, 232529/1186688 files, 2143683/2371587 blocks mount: Stale NFS file handle I only need the data under .thunderbird/, but i need to mount the ext partition first? or not? Is there another way to get my data? regards Markus From bothie at gmx.de Fri Sep 16 15:41:36 2011 From: bothie at gmx.de (Bodo Thiesen) Date: Fri, 16 Sep 2011 17:41:36 +0200 Subject: damaged partition In-Reply-To: References: Message-ID: <20110916174136.6df111d2@gmx.de> * Markus Feldmann hat geschrieben: > > fsck from util-linux-ng 2.17.2 > e2fsck 1.41.12 (17-May-2010) > /dev/sda7: clean, 232529/1186688 files, 2143683/2371587 blocks Hallo Markus # e2fsck $dev First tests, wether the file system state is ?clean? and if, wether the maximal mount count or maximal mount time has reched. It tests the file system only, if it's state is not ?clean? (on an ext2 with journal this is usually never the case, because after journal replay the file system will be clean) or any of the bothe maximal values has reached. So, to test a file system which is marked clean, you have to force it: # e2fsck -f $dev BTW: And if you want some progress bar, add a -C 0: # e2fsck -f -C 0 $dev That's the command I usually run. > > mount: Stale NFS file handle What's ?dmesg | tail -n 20? saying immediatelly after running that command? What's the result of ?grep sda7 /proc/mounts?? What's the result of ?grep sda7 /etc/mtab?? Liebe Gr??e Bodo From feldmann_markus at gmx.de Fri Sep 16 20:33:29 2011 From: feldmann_markus at gmx.de (Markus Feldmann) Date: Fri, 16 Sep 2011 22:33:29 +0200 Subject: damaged partition In-Reply-To: <20110916174136.6df111d2@gmx.de> References: <20110916174136.6df111d2@gmx.de> Message-ID: Am 16.09.2011 17:41, schrieb Bodo Thiesen: > So, to test a file system which is marked clean, you have to force it: > > # e2fsck -f $dev > Hi Bodo, here the Result of e2fsck 1.41.12 (17-May-2010) Resize_inode not enabled, but the resize inode is non-zero. Clear? cancelled! Should i go further? This will be overide some Bits. I am frightened. Does this mean somebody tried to resize this partion without success? Maybe I? regards Markus From samuel at bcgreen.com Fri Sep 16 21:22:07 2011 From: samuel at bcgreen.com (Stephen Samuel) Date: Fri, 16 Sep 2011 14:22:07 -0700 Subject: damaged partition In-Reply-To: References: <20110916174136.6df111d2@gmx.de> Message-ID: You're working with a damaged partition.. Probably the first thing to do would be to make a copy of the partition. Get either a 16GB thumb drive, or an external drive that you can partition appropriately.. then make a copy of the damaged partition -- This may be a trial-and-error situation. Once you have a good copy, then you can work on the copy. If you get a laptop drive, then you can make multiple copies of the bad partition(s) overnight and then try different recovery paths until you get what you need. If you only care about parts of the data, on the drive you can also try mounting readonly -- and see if the data you want is available for copying out without having to repair the entire partition. mount -o ro /dev/sda7 /mnt/sda7 On Fri, Sep 16, 2011 at 1:33 PM, Markus Feldmann wrote: > Am 16.09.2011 17:41, schrieb Bodo Thiesen: > > So, to test a file system which is marked clean, you have to force it: >> >> # e2fsck -f $dev >> >> Hi Bodo, > > here the Result of > > > e2fsck 1.41.12 (17-May-2010) > Resize_inode not enabled, but the resize inode is non-zero. Clear? > cancelled! > > > Should i go further? This will be overide some Bits. I am frightened. Does > this mean somebody tried to resize this partion without success? Maybe I? > > regards Markus > > > ______________________________**_________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/**mailman/listinfo/ext3-users > -- Stephen Samuel http://www.bcgreen.com Software, like love, 778-861-7641 grows when you give it away -------------- next part -------------- An HTML attachment was scrubbed... URL: From feldmann_markus at gmx.de Sat Sep 17 22:21:55 2011 From: feldmann_markus at gmx.de (Markus Feldmann) Date: Sun, 18 Sep 2011 00:21:55 +0200 Subject: damaged partition In-Reply-To: References: <20110916174136.6df111d2@gmx.de> Message-ID: Am 16.09.2011 23:22, schrieb Stephen Samuel: > You're working with a damaged partition.. Probably the first thing to do > would be to make a copy of the partition. > Get either a 16GB thumb drive, or an external drive that you can > partition appropriately.. then make a copy of the damaged partition -- > This may be a trial-and-error situation. > > Once you have a good copy, then you can work on the copy. > > If you get a laptop drive, then you can make multiple copies of the bad > partition(s) overnight and then try different recovery paths until you > get what you need. > > If you only care about parts of the data, on the drive you can also try > mounting readonly -- and see if the data you want is available for > copying out without having to repair the entire partition. > > mount -o ro /dev/sda7 /mnt/sda7 Hi, i bought an external 1GB Harddisk and saved this damaged partition. Furhter on i tried: http://pastebin.com/f1TRMNS2 The output is in german, sorry. However i dont know when i should to press "yes", so i only pressed "y". Then i started and he found ".", ".." and i copied "." to some place on my external disk. I could recover my .thunderbird :-) regards Markus From basketboy at bk.ru Fri Sep 23 05:51:19 2011 From: basketboy at bk.ru (Andrey) Date: Fri, 23 Sep 2011 09:51:19 +0400 Subject: ext3 with maildir++ = huge disk latency and high load Message-ID: <4E7C1E57.9040904@bk.ru> Hello, I have a production mail server with maildir++ structure and about 250GB (~10 millions) of files on the ext3 partition on RAID5. It's mounted with noatime option. These mail server is responsible to local delivery and storing mail messages. System has Debian Squeeze installed and Exim as MDA + Dovecot as IMAP+POP3 server. Bonnie results are terrible. Sequential output for Block and Rewrite are 10722ms and 9232ms. So if there is a 1000 messages in the mail queue load is extremely high, delivery time is very big and server can hang. I did not see such problems with UFS on FreeBSD server. As I understand ext3 file system is really bad for such configurations with Maildir++ (many smaill files)? Is there a way to decrease disk latency on ext3 or speed up it? With regards, Andrey From basketboy at bk.ru Fri Sep 23 08:14:57 2011 From: basketboy at bk.ru (Andrey) Date: Fri, 23 Sep 2011 12:14:57 +0400 Subject: Fwd: Re: ext3 with maildir++ = huge disk latency and high load Message-ID: <4E7C4001.2070003@bk.ru> 23.09.2011 11:31, Janne Pikkarainen ?????: Thank you for reply, BTW, other webserver has almost the same bonnie results (10283ms and 5884ms) on ext3 partition with 45GB of data (1.5 millions of files)?! Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4 with SmartArray 6i controller (as I see it comes with 128MB BBWC enabler but not kit). I did not tried to mount fs with barriers disabled. Does it have any crititcal risks? Bonnie tests was performed in the morning when we have a mininmal user load. With regards, Andrey. > Hello, > > On 09/23/2011 08:51 AM, Andrey wrote: >> Hello, >> >> I have a production mail server with maildir++ structure and about >> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's >> mounted with noatime option. These mail server is responsible to local >> delivery and storing mail messages. >> >> System has Debian Squeeze installed and Exim as MDA + Dovecot as >> IMAP+POP3 server. >> >> Bonnie results are terrible. Sequential output for Block and Rewrite >> are 10722ms and 9232ms. So if there is a 1000 messages in the mail >> queue load is extremely high, delivery time is very big and server can >> hang. I did not see such problems with UFS on FreeBSD server. >> >> As I understand ext3 file system is really bad for such configurations >> with Maildir++ (many smaill files)? Is there a way to decrease disk >> latency on ext3 or speed up it? >> >> With regards, Andrey >> >> ___ > > (replying off-list, so the ext3 developers will not start a flamewar) > > In my opinion ext3 is a terrible file system for your kind of workload, > especially if you have lots of concurrent clients accessing their > mailboxes. Even though ext3 has evolved over the years and has gained > features such as directory indexes, it still is not good for tens of > million of frequently changing small files with lots of concurrency. > Been there, done that, not gonna do it again. I administer servers with > 50 000 - 100 000 user accounts, with couple of thousands active IMAP > connections. > > Personally I switched from ext3 to ReiserFS many years ago and happily > used it between 2004-2008, then after things went downhill from ReiserFS > development point of view, I switched to XFS during a server hardware > refresh. ReiserFS was excellent, but it really started to slow down if > file system was more than 85% full and it also got fragmented over time. > > XFS has been rock-solid and fast since 2008 for me, but it has an > achilles heel of its own: if I need to remove lots of files from a huge > directory tree, the delete performance is quite sucky compared to other > file systems. This has been improved in the later kernel versions with > the new delaylog parameter, but how much, I've not yet tested. > > All this said, the performance of ext3 should not be THAT bad you are > describing. Is the bonnie result done while the server is idle or while > it has mail clients accessing it all the time? If you have hardware > RAID, is there a battery-backed up write cache and are you sure it's > enabled? Also, have you tried to mount your file system with barriers > disabled? What kind of server setup you have? > > Something is very wrong. > > Best regards, > > Janne Pikkarainen > > From basketboy at bk.ru Fri Sep 23 09:52:52 2011 From: basketboy at bk.ru (Andrey) Date: Fri, 23 Sep 2011 13:52:52 +0400 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E7C35B8.70100@mikrobitti.fi> References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi> Message-ID: <4E7C56F4.6080900@bk.ru> Thank you for reply, BTW, other webserver has almost the same bonnie results (10283ms and 5884ms) on ext3 partition with 45GB of data (1.5 millions of files)?! Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4 with SmartArray 6i controller (as I see it comes with 128MB BBWC enabler but not kit). I did not tried to mount fs with barriers disabled. Does it have any crititcal risks? Bonnie tests was performed in the morning when we have a mininmal user load. But why the same server with the same RAID(4 disks) but with FreeBSD+UFS was much better? I guess problem is in ext3 then? With regards, Andrey. 23.09.2011 11:31, Janne Pikkarainen ?????: > Hello, > > On 09/23/2011 08:51 AM, Andrey wrote: >> Hello, >> >> I have a production mail server with maildir++ structure and about >> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's >> mounted with noatime option. These mail server is responsible to local >> delivery and storing mail messages. >> >> System has Debian Squeeze installed and Exim as MDA + Dovecot as >> IMAP+POP3 server. >> >> Bonnie results are terrible. Sequential output for Block and Rewrite >> are 10722ms and 9232ms. So if there is a 1000 messages in the mail >> queue load is extremely high, delivery time is very big and server can >> hang. I did not see such problems with UFS on FreeBSD server. >> >> As I understand ext3 file system is really bad for such configurations >> with Maildir++ (many smaill files)? Is there a way to decrease disk >> latency on ext3 or speed up it? >> >> With regards, Andrey >> >> ___ > > (replying off-list, so the ext3 developers will not start a flamewar) > > In my opinion ext3 is a terrible file system for your kind of workload, > especially if you have lots of concurrent clients accessing their > mailboxes. Even though ext3 has evolved over the years and has gained > features such as directory indexes, it still is not good for tens of > million of frequently changing small files with lots of concurrency. > Been there, done that, not gonna do it again. I administer servers with > 50 000 - 100 000 user accounts, with couple of thousands active IMAP > connections. > > Personally I switched from ext3 to ReiserFS many years ago and happily > used it between 2004-2008, then after things went downhill from ReiserFS > development point of view, I switched to XFS during a server hardware > refresh. ReiserFS was excellent, but it really started to slow down if > file system was more than 85% full and it also got fragmented over time. > > XFS has been rock-solid and fast since 2008 for me, but it has an > achilles heel of its own: if I need to remove lots of files from a huge > directory tree, the delete performance is quite sucky compared to other > file systems. This has been improved in the later kernel versions with > the new delaylog parameter, but how much, I've not yet tested. > > All this said, the performance of ext3 should not be THAT bad you are > describing. Is the bonnie result done while the server is idle or while > it has mail clients accessing it all the time? If you have hardware > RAID, is there a battery-backed up write cache and are you sure it's > enabled? Also, have you tried to mount your file system with barriers > disabled? What kind of server setup you have? > > Something is very wrong. > > Best regards, > > Janne Pikkarainen > > From sandeen at redhat.com Fri Sep 23 14:43:53 2011 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 23 Sep 2011 09:43:53 -0500 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E7C56F4.6080900@bk.ru> References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi> <4E7C56F4.6080900@bk.ru> Message-ID: <4E7C9B29.3080508@redhat.com> On 9/23/11 4:52 AM, Andrey wrote: > Thank you for reply, > > BTW, other webserver has almost the same bonnie results (10283ms and > 5884ms) on ext3 partition with 45GB of data (1.5 millions of > files)?! > > Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4 > with SmartArray 6i controller (as I see it comes with 128MB BBWC > enabler but not kit). > > I did not tried to mount fs with barriers disabled. Does it have any > crititcal risks? Yes. If you have write caches on either the raid controller or on the disks behind it which can be lost on a power outage, running without barriers will potentially corrupt your filesystem if you lose power, even though you have ext3's journaling. Journaling depends on write guarantees which are lost if drive write caches evaporate. -Eric > Bonnie tests was performed in the morning when we have a mininmal user load. > > But why the same server with the same RAID(4 disks) but with FreeBSD+UFS was much better? I guess problem is in ext3 then? > > With regards, Andrey. > > 23.09.2011 11:31, Janne Pikkarainen ?????: >> Hello, >> >> On 09/23/2011 08:51 AM, Andrey wrote: >>> Hello, >>> >>> I have a production mail server with maildir++ structure and about >>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's >>> mounted with noatime option. These mail server is responsible to local >>> delivery and storing mail messages. >>> >>> System has Debian Squeeze installed and Exim as MDA + Dovecot as >>> IMAP+POP3 server. >>> >>> Bonnie results are terrible. Sequential output for Block and Rewrite >>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail >>> queue load is extremely high, delivery time is very big and server can >>> hang. I did not see such problems with UFS on FreeBSD server. >>> >>> As I understand ext3 file system is really bad for such configurations >>> with Maildir++ (many smaill files)? Is there a way to decrease disk >>> latency on ext3 or speed up it? >>> >>> With regards, Andrey >>> >>> ___ >> >> (replying off-list, so the ext3 developers will not start a flamewar) >> >> In my opinion ext3 is a terrible file system for your kind of workload, >> especially if you have lots of concurrent clients accessing their >> mailboxes. Even though ext3 has evolved over the years and has gained >> features such as directory indexes, it still is not good for tens of >> million of frequently changing small files with lots of concurrency. >> Been there, done that, not gonna do it again. I administer servers with >> 50 000 - 100 000 user accounts, with couple of thousands active IMAP >> connections. >> >> Personally I switched from ext3 to ReiserFS many years ago and happily >> used it between 2004-2008, then after things went downhill from ReiserFS >> development point of view, I switched to XFS during a server hardware >> refresh. ReiserFS was excellent, but it really started to slow down if >> file system was more than 85% full and it also got fragmented over time. >> >> XFS has been rock-solid and fast since 2008 for me, but it has an >> achilles heel of its own: if I need to remove lots of files from a huge >> directory tree, the delete performance is quite sucky compared to other >> file systems. This has been improved in the later kernel versions with >> the new delaylog parameter, but how much, I've not yet tested. >> >> All this said, the performance of ext3 should not be THAT bad you are >> describing. Is the bonnie result done while the server is idle or while >> it has mail clients accessing it all the time? If you have hardware >> RAID, is there a battery-backed up write cache and are you sure it's >> enabled? Also, have you tried to mount your file system with barriers >> disabled? What kind of server setup you have? >> >> Something is very wrong. >> >> Best regards, >> >> Janne Pikkarainen >> >> > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From sandeen at redhat.com Fri Sep 23 14:48:39 2011 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 23 Sep 2011 09:48:39 -0500 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E7C9B29.3080508@redhat.com> References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi> <4E7C56F4.6080900@bk.ru> <4E7C9B29.3080508@redhat.com> Message-ID: <4E7C9C47.7070907@redhat.com> On 9/23/11 9:43 AM, Eric Sandeen wrote: > On 9/23/11 4:52 AM, Andrey wrote: >> Thank you for reply, >> >> BTW, other webserver has almost the same bonnie results (10283ms and >> 5884ms) on ext3 partition with 45GB of data (1.5 millions of >> files)?! >> >> Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4 >> with SmartArray 6i controller (as I see it comes with 128MB BBWC >> enabler but not kit). >> >> I did not tried to mount fs with barriers disabled. Does it have any >> crititcal risks? > > Yes. If you have write caches on either the raid controller or on > the disks behind it which can be lost on a power outage, running > without barriers will potentially corrupt your filesystem if you lose > power, even though you have ext3's journaling. > > Journaling depends on write guarantees which are lost if drive > write caches evaporate. ... evaporate unexpectedly that is. barriers manage that cache. If write caches are battery-backed (or off), then nobarrier is safe. -Eric > -Eric From kwijibo at zianet.com Fri Sep 23 17:19:30 2011 From: kwijibo at zianet.com (Bob) Date: Fri, 23 Sep 2011 11:19:30 -0600 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E7C1E57.9040904@bk.ru> References: <4E7C1E57.9040904@bk.ru> Message-ID: <4E7CBFA2.8000300@zianet.com> On 09/22/2011 11:51 PM, Andrey wrote: > Hello, > > I have a production mail server with maildir++ structure and about > 250GB (~10 millions) of files on the ext3 partition on RAID5. It's > mounted with noatime option. These mail server is responsible to local > delivery and storing mail messages. > > System has Debian Squeeze installed and Exim as MDA + Dovecot as > IMAP+POP3 server. > > Bonnie results are terrible. Sequential output for Block and Rewrite > are 10722ms and 9232ms. So if there is a 1000 messages in the mail > queue load is extremely high, delivery time is very big and server can > hang. I did not see such problems with UFS on FreeBSD server. > > As I understand ext3 file system is really bad for such configurations > with Maildir++ (many smaill files)? Is there a way to decrease disk > latency on ext3 or speed up it? > My guess is that your problem is many files in one directory not necessarily having many files on the whole file system. In my experience large directories eat ext3's lunch. The introduction of indexing did help but it still fell behind on performance when compared to some other file systems. You may want to make sure your file system has indexing turned on but with the vintage of your Debian I would assume it is on by default. I ran into this problem many years ago (before indexing was an ext3 option). It was even worse as the Maildir storage was being accessed over NFS. Ended up eventually biting the bullet and moving to WAFL (NetApp). My guess is that users trying to access these large directories via IMAP and POP are also facing large delays and possibly even time outs. Steven From basketboy at bk.ru Sat Sep 24 17:46:49 2011 From: basketboy at bk.ru (Andrey) Date: Sat, 24 Sep 2011 21:46:49 +0400 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E7CBFA2.8000300@zianet.com> References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com> Message-ID: <4E7E1789.8090801@bk.ru> Sure, indexing is on by default on Debian ext3. I think I'll try to test some cases an run bonnie++ on freesh HP server with the same configuration. Also I have maildir with more than 10000 messages an don't have timesouts and access problesm via IMAP to it, that's strange. Sometimes I notice that copying message to Sent folder can wait a little but it's a seldom issue but can corellate with it, I agree. Also I see in Exim logs that DT (delivery time) is equal to more than 2 seconds although user's maildir is almost empty, so I intend to that it is a primary problem of whole ext3 system or RAID5 hardware. 23.09.2011 21:19, Bob ?????: > On 09/22/2011 11:51 PM, Andrey wrote: >> Hello, >> >> I have a production mail server with maildir++ structure and about >> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's >> mounted with noatime option. These mail server is responsible to local >> delivery and storing mail messages. >> >> System has Debian Squeeze installed and Exim as MDA + Dovecot as >> IMAP+POP3 server. >> >> Bonnie results are terrible. Sequential output for Block and Rewrite >> are 10722ms and 9232ms. So if there is a 1000 messages in the mail >> queue load is extremely high, delivery time is very big and server can >> hang. I did not see such problems with UFS on FreeBSD server. >> >> As I understand ext3 file system is really bad for such configurations >> with Maildir++ (many smaill files)? Is there a way to decrease disk >> latency on ext3 or speed up it? >> > > My guess is that your problem is many files in one directory not > necessarily > having many files on the whole file system. In my experience large > directories > eat ext3's lunch. The introduction of indexing did help but it still > fell behind > on performance when compared to some other file systems. You may want > to make sure your file system has indexing turned on but with the > vintage of > your Debian I would assume it is on by default. > > I ran into this problem many years ago (before indexing was an ext3 > option). It > was even worse as the Maildir storage was being accessed over NFS. Ended > up eventually biting the bullet and moving to WAFL (NetApp). > > My guess is that users trying to access these large directories via IMAP > and POP > are also facing large delays and possibly even time outs. > > Steven > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > > From tytso at mit.edu Sat Sep 24 19:04:47 2011 From: tytso at mit.edu (Ted Ts'o) Date: Sat, 24 Sep 2011 15:04:47 -0400 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E7E1789.8090801@bk.ru> References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com> <4E7E1789.8090801@bk.ru> Message-ID: <20110924190447.GH2779@thunk.org> On Sat, Sep 24, 2011 at 09:46:49PM +0400, Andrey wrote: > Sure, indexing is on by default on Debian ext3. I think I'll try to > test some cases an run bonnie++ on freesh HP server with the same > configuration. For really gargantuan directories, indexing definitely hurts when you do a readdir+stat (i.e. /bin/ls -sF) or readdir+unlink (i.e., rm -rf)/ > Also I have maildir with more than 10000 messages an don't have > timesouts and access problesm via IMAP to it, that's strange. That's probably because this problem can be worked around by doing a readdir, then sorting by the inode number (d_ino), and the doing the stat or unlink. Some programs, especially those that expressly deal with Maildir directories, have this optimization already there. I also have a LD_PRELOAD hack that can be used to demonstrate why putting this is a good idea. You can google for spd_readdir and find it. I'll also put the latest version of it in the contrib directory in e2fsprogs for the next release. - Ted From tytso at mit.edu Sun Sep 25 01:41:34 2011 From: tytso at mit.edu (Ted Ts'o) Date: Sat, 24 Sep 2011 21:41:34 -0400 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <20110924190447.GH2779@thunk.org> References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com> <4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org> Message-ID: <20110925014134.GC2606@thunk.org> On Sat, Sep 24, 2011 at 03:04:47PM -0400, Ted Ts'o wrote: > I also have a LD_PRELOAD hack that can be used to demonstrate why > putting this is a good idea. You can google for spd_readdir and find > it. I'll also put the latest version of it in the contrib directory > in e2fsprogs for the next release. While I was looking at spd_readdir.c before including it in e2fsprogs's contrib directory, I realized the last version I released was pretty incomplete, and didn't work with modern-day coreutils. So I'll be including this version into the e2fsprogs git tree, but since in the past I've distributing by sending it to folks via e-mail, here's an updated version of spd_readdir.c. Please try this to any older versions that you might find in mailing list archives. Note that this preload is not going to work for all programs. In particular, although it does supply readdir_r(), it is *not* thread safe. So I can't recommend this as something to be dropped in /etc/ld.so.preload. - Ted /* * readdir accelerator * * (C) Copyright 2003, 2004 by Theodore Ts'o. * * Compile using the command: * * gcc -o spd_readdir.so -fPIC -shared spd_readdir.c -ldl * * Use it by setting the LD_PRELOAD environment variable: * * export LD_PRELOAD=/usr/local/sbin/spd_readdir.so * * %Begin-Header% * This file may be redistributed under the terms of the GNU Public * License, version 2. * %End-Header% * */ #define ALLOC_STEPSIZE 100 #define MAX_DIRSIZE 0 #define DEBUG #ifdef DEBUG #define DEBUG_DIR(x) {if (do_debug) { x; }} #else #define DEBUG_DIR(x) #endif #define _GNU_SOURCE #define __USE_LARGEFILE64 #include #include #include #include #include #include #include #include #include struct dirent_s { unsigned long long d_ino; long long d_off; unsigned short int d_reclen; unsigned char d_type; char *d_name; }; struct dir_s { DIR *dir; int num; int max; struct dirent_s *dp; int pos; int direct; struct dirent ret_dir; struct dirent64 ret_dir64; }; static int (*real_closedir)(DIR *dir) = 0; static DIR *(*real_opendir)(const char *name) = 0; static DIR *(*real_fdopendir)(int fd) = 0; static void *(*real_rewinddir)(DIR *dirp) = 0; static struct dirent *(*real_readdir)(DIR *dir) = 0; static int (*real_readdir_r)(DIR *dir, struct dirent *entry, struct dirent **result) = 0; static struct dirent64 *(*real_readdir64)(DIR *dir) = 0; static off_t (*real_telldir)(DIR *dir) = 0; static void (*real_seekdir)(DIR *dir, off_t offset) = 0; static int (*real_dirfd)(DIR *dir) = 0; static unsigned long max_dirsize = MAX_DIRSIZE; static int num_open = 0; #ifdef DEBUG static int do_debug = 0; #endif static void setup_ptr() { char *cp; real_opendir = dlsym(RTLD_NEXT, "opendir"); real_fdopendir = dlsym(RTLD_NEXT, "fdopendir"); real_closedir = dlsym(RTLD_NEXT, "closedir"); real_rewinddir = dlsym(RTLD_NEXT, "rewinddir"); real_readdir = dlsym(RTLD_NEXT, "readdir"); real_readdir_r = dlsym(RTLD_NEXT, "readdir_r"); real_readdir64 = dlsym(RTLD_NEXT, "readdir64"); real_telldir = dlsym(RTLD_NEXT, "telldir"); real_seekdir = dlsym(RTLD_NEXT, "seekdir"); real_dirfd = dlsym(RTLD_NEXT, "dirfd"); if ((cp = getenv("SPD_READDIR_MAX_SIZE")) != NULL) { max_dirsize = atol(cp); } #ifdef DEBUG if (getenv("SPD_READDIR_DEBUG")) { printf("initialized!\n"); do_debug++; } #endif } static void free_cached_dir(struct dir_s *dirstruct) { int i; if (!dirstruct->dp) return; for (i=0; i < dirstruct->num; i++) { free(dirstruct->dp[i].d_name); } free(dirstruct->dp); dirstruct->dp = 0; dirstruct->max = dirstruct->num = 0; } static int ino_cmp(const void *a, const void *b) { const struct dirent_s *ds_a = (const struct dirent_s *) a; const struct dirent_s *ds_b = (const struct dirent_s *) b; ino_t i_a, i_b; i_a = ds_a->d_ino; i_b = ds_b->d_ino; if (ds_a->d_name[0] == '.') { if (ds_a->d_name[1] == 0) i_a = 0; else if ((ds_a->d_name[1] == '.') && (ds_a->d_name[2] == 0)) i_a = 1; } if (ds_b->d_name[0] == '.') { if (ds_b->d_name[1] == 0) i_b = 0; else if ((ds_b->d_name[1] == '.') && (ds_b->d_name[2] == 0)) i_b = 1; } return (i_a - i_b); } struct dir_s *alloc_dirstruct(DIR *dir) { struct dir_s *dirstruct; dirstruct = malloc(sizeof(struct dir_s)); if (dirstruct) memset(dirstruct, 0, sizeof(struct dir_s)); dirstruct->dir = dir; return dirstruct; } void cache_dirstruct(struct dir_s *dirstruct) { struct dirent_s *ds, *dnew; struct dirent64 *d; while ((d = (*real_readdir64)(dirstruct->dir)) != NULL) { if (dirstruct->num >= dirstruct->max) { dirstruct->max += ALLOC_STEPSIZE; DEBUG_DIR(printf("Reallocating to size %d\n", dirstruct->max)); dnew = realloc(dirstruct->dp, dirstruct->max * sizeof(struct dir_s)); if (!dnew) goto nomem; dirstruct->dp = dnew; } ds = &dirstruct->dp[dirstruct->num++]; ds->d_ino = d->d_ino; ds->d_off = d->d_off; ds->d_reclen = d->d_reclen; ds->d_type = d->d_type; if ((ds->d_name = malloc(strlen(d->d_name)+1)) == NULL) { dirstruct->num--; goto nomem; } strcpy(ds->d_name, d->d_name); DEBUG_DIR(printf("readdir: %lu %s\n", (unsigned long) d->d_ino, d->d_name)); } qsort(dirstruct->dp, dirstruct->num, sizeof(struct dirent_s), ino_cmp); return; nomem: DEBUG_DIR(printf("No memory, backing off to direct readdir\n")); free_cached_dir(dirstruct); dirstruct->direct = 1; } DIR *opendir(const char *name) { DIR *dir; struct dir_s *dirstruct; struct stat st; if (!real_opendir) setup_ptr(); DEBUG_DIR(printf("Opendir(%s) (%d open)\n", name, num_open++)); dir = (*real_opendir)(name); if (!dir) return NULL; dirstruct = alloc_dirstruct(dir); if (!dirstruct) { (*real_closedir)(dir); errno = -ENOMEM; return NULL; } if (max_dirsize && (stat(name, &st) == 0) && (st.st_size > max_dirsize)) { DEBUG_DIR(printf("Directory size %ld, using direct readdir\n", st.st_size)); dirstruct->direct = 1; return (DIR *) dirstruct; } cache_dirstruct(dirstruct); return ((DIR *) dirstruct); } DIR *fdopendir(int fd) { DIR *dir; struct dir_s *dirstruct; struct stat st; if (!real_fdopendir) setup_ptr(); DEBUG_DIR(printf("fdpendir(%d) (%d open)\n", fd, num_open++)); dir = (*real_fdopendir)(fd); if (!dir) return NULL; dirstruct = alloc_dirstruct(dir); if (!dirstruct) { (*real_closedir)(dir); errno = -ENOMEM; return NULL; } if (max_dirsize && (fstat(fd, &st) == 0) && (st.st_size > max_dirsize)) { DEBUG_DIR(printf("Directory size %ld, using direct readdir\n", st.st_size)); dirstruct->dir = dir; dirstruct->direct = 1; return (DIR *) dirstruct; } cache_dirstruct(dirstruct); return ((DIR *) dirstruct); } int closedir(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; DEBUG_DIR(printf("Closedir (%d open)\n", --num_open)); if (dirstruct->dir) (*real_closedir)(dirstruct->dir); free_cached_dir(dirstruct); free(dirstruct); return 0; } struct dirent *readdir(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; struct dirent_s *ds; if (dirstruct->direct) return (*real_readdir)(dirstruct->dir); if (dirstruct->pos >= dirstruct->num) return NULL; ds = &dirstruct->dp[dirstruct->pos++]; dirstruct->ret_dir.d_ino = ds->d_ino; dirstruct->ret_dir.d_off = ds->d_off; dirstruct->ret_dir.d_reclen = ds->d_reclen; dirstruct->ret_dir.d_type = ds->d_type; strncpy(dirstruct->ret_dir.d_name, ds->d_name, sizeof(dirstruct->ret_dir.d_name)); return (&dirstruct->ret_dir); } int readdir_r(DIR *dir, struct dirent *entry, struct dirent **result) { struct dir_s *dirstruct = (struct dir_s *) dir; struct dirent_s *ds; if (dirstruct->direct) return (*real_readdir_r)(dirstruct->dir, entry, result); if (dirstruct->pos >= dirstruct->num) { *result = NULL; return 0; } ds = &dirstruct->dp[dirstruct->pos++]; entry->d_ino = ds->d_ino; entry->d_off = ds->d_off; entry->d_reclen = ds->d_reclen; entry->d_type = ds->d_type; strncpy(entry->d_name, ds->d_name, sizeof(entry->d_name)); *result = entry; return 0; } struct dirent64 *readdir64(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; struct dirent_s *ds; if (dirstruct->direct) return (*real_readdir64)(dirstruct->dir); if (dirstruct->pos >= dirstruct->num) return NULL; ds = &dirstruct->dp[dirstruct->pos++]; dirstruct->ret_dir64.d_ino = ds->d_ino; dirstruct->ret_dir64.d_off = ds->d_off; dirstruct->ret_dir64.d_reclen = ds->d_reclen; dirstruct->ret_dir64.d_type = ds->d_type; strncpy(dirstruct->ret_dir64.d_name, ds->d_name, sizeof(dirstruct->ret_dir64.d_name)); return (&dirstruct->ret_dir64); } off_t telldir(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; if (dirstruct->direct) return (*real_telldir)(dirstruct->dir); return ((off_t) dirstruct->pos); } void seekdir(DIR *dir, off_t offset) { struct dir_s *dirstruct = (struct dir_s *) dir; if (dirstruct->direct) { (*real_seekdir)(dirstruct->dir, offset); return; } dirstruct->pos = offset; } void rewinddir(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; (*real_rewinddir)(dirstruct->dir); if (dirstruct->direct) return; dirstruct->pos = 0; free_cached_dir(dirstruct); cache_dirstruct(dirstruct); } int dirfd(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; int fd = (*real_dirfd)(dirstruct->dir); DEBUG_DIR(printf("dirfd %d, %p\n", fd, real_dirfd)); return fd; } From adilger at dilger.ca Sun Sep 25 06:16:12 2011 From: adilger at dilger.ca (Andreas Dilger) Date: Sun, 25 Sep 2011 00:16:12 -0600 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <20110925014134.GC2606@thunk.org> References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com> <4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org> <20110925014134.GC2606@thunk.org> Message-ID: <1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca> On Sat, Sep 24, 2011 at 03:04:47PM -0400, Ted Ts'o wrote: > I also have a LD_PRELOAD hack that can be used to demonstrate why > putting this is a good idea. You can google for spd_readdir and find > it. I'll also put the latest version of it in the contrib directory > in e2fsprogs for the next release. What we've started doing in Lustre (which has to deal with network latency, but the same problem in terms of htree vs. inode ordering) is to detect if the application is doing readdir+stat on the dirents in readdir order, and then fork a thread to statahead the entries in the kernel. It would be possible to do something like this in the ext4 readdir code to do dirent readahead, sort, and then prefetch the inodes in order (partially or completely, depending on the directory size), but as yet we aren't working on anything at the ext4 level. There was a patch to do something similar to this for btrfs as well, with the DCACHE_NEED_LOOKUP flag. That avoids a lot of the complexity between instantiating dcache entries from readdir without yet having read the inode from disk. The other proposal I've made in the past is to try and allocate inodes from the inode table in roughly hash order, so that when it comes time to do readdir+stat that the dirents and inodes are already partially in the same order. That breaks down in case of renames, but works well for normal usage. Cheers, Andreas From tytso at mit.edu Sun Sep 25 21:21:27 2011 From: tytso at mit.edu (Ted Ts'o) Date: Sun, 25 Sep 2011 17:21:27 -0400 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca> References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com> <4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org> <20110925014134.GC2606@thunk.org> <1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca> Message-ID: <20110925212127.GD27089@thunk.org> On Sun, Sep 25, 2011 at 12:16:12AM -0600, Andreas Dilger wrote: > > It would be possible to do something like this in the ext4 readdir > code to do dirent readahead, sort, and then prefetch the inodes > in order (partially or completely, depending on the directory size), > but as yet we aren't working on anything at the ext4 level. What we have in ext4 right now is if we need to do disk i/o to read from the inode table, we will read in adjacent blocks from the inode table, on the theory that the effort needed to read in 32k versus 4k is pretty much the same. So if the inodes were allocated all at the same time, they will be sequentially ordered, and so the inode table readahead should help quite a lot. I'll note that with really large maildirs, especially on a mail server with many other maildirs, over time the inodes for each individual file will get scattered all over the place, and so pretty much any scheme that uses a inode table separate from the blocks where the directory entries are stored is going to get hammered by this use case. Ultimately, the best way to solve this problem is a more intelligent application that caches the contents of the key headers in a database, so you don't need to scan the contents of the entire Maildir when doing common IMAP operations. - Ted From basketboy at bk.ru Thu Sep 29 07:29:42 2011 From: basketboy at bk.ru (Andrey) Date: Thu, 29 Sep 2011 11:29:42 +0400 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <20110925212127.GD27089@thunk.org> References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com> <4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org> <20110925014134.GC2606@thunk.org> <1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca> <20110925212127.GD27089@thunk.org> Message-ID: <4E841E66.1090600@bk.ru> Ok. Here are bonnie results on fresh installed Debian with 200GB FREE ext3 /home partitition (4 disks in RAID5 on HP Proliant DL380 G4 server): Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP debian 2G 243 97 22555 10 8794 2 1810 97 120444 11 317.0 5 Latency 135ms 967ms 723ms 26526us 13143us 586ms Latency is also very bad according results. What is the reason? Hardware or ext3 itseld? Will try with xfs an ext4 and compare then. 26.09.2011 01:21, Ted Ts'o ?????: > On Sun, Sep 25, 2011 at 12:16:12AM -0600, Andreas Dilger wrote: >> >> It would be possible to do something like this in the ext4 readdir >> code to do dirent readahead, sort, and then prefetch the inodes >> in order (partially or completely, depending on the directory size), >> but as yet we aren't working on anything at the ext4 level. > > What we have in ext4 right now is if we need to do disk i/o to read > from the inode table, we will read in adjacent blocks from the inode > table, on the theory that the effort needed to read in 32k versus 4k > is pretty much the same. So if the inodes were allocated all at the > same time, they will be sequentially ordered, and so the inode table > readahead should help quite a lot. > > I'll note that with really large maildirs, especially on a mail server > with many other maildirs, over time the inodes for each individual > file will get scattered all over the place, and so pretty much any > scheme that uses a inode table separate from the blocks where the > directory entries are stored is going to get hammered by this use > case. > > Ultimately, the best way to solve this problem is a more intelligent > application that caches the contents of the key headers in a database, > so you don't need to scan the contents of the entire Maildir when > doing common IMAP operations. > > - Ted > > From tytso at mit.edu Thu Sep 29 13:08:32 2011 From: tytso at mit.edu (Ted Ts'o) Date: Thu, 29 Sep 2011 09:08:32 -0400 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E841E66.1090600@bk.ru> References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com> <4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org> <20110925014134.GC2606@thunk.org> <1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca> <20110925212127.GD27089@thunk.org> <4E841E66.1090600@bk.ru> Message-ID: <20110929130832.GP19250@thunk.org> On Thu, Sep 29, 2011 at 11:29:42AM +0400, Andrey wrote: > Ok. Here are bonnie results on fresh installed Debian with 200GB > FREE ext3 /home partitition (4 disks in RAID5 on HP Proliant DL380 > G4 server): > > Version 1.96 ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- > --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec > %CP /sec %CP > debian 2G 243 97 22555 10 8794 2 1810 97 120444 > 11 317.0 5 > Latency 135ms 967ms 723ms 26526us 13143us > 586ms > > Latency is also very bad according results. What is the reason? > Hardware or ext3 itseld? Will try with xfs an ext4 and compare then. Ext4 will do better than ext3; but I can predict to you up front that xfs will do better than ext4, because it is better optimized for RAID arrays at the moment. Ext4 has superblock fields to store RAID 5 parameters, but the code to fully take advantage of those RAID parameters is not fully implemented. You should also take a look at your RAID parameters. If your RAID stripe size is too large, it will impact workloads such as mail servers which typically involve small writes. Have you considered using RAID 10 over RAID 5? It's not as efficient from a space perspective but if you are primarily concerned about throughput and latency, it's the way to go. - Ted From basketboy at bk.ru Thu Sep 29 14:36:06 2011 From: basketboy at bk.ru (Andrey) Date: Thu, 29 Sep 2011 18:36:06 +0400 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E7C35B8.70100@mikrobitti.fi> References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi> Message-ID: <4E848256.4090106@bk.ru> Let me to share some testing RAID 5 results with bonnie++: ext3 (defaults,noatime): Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP debian 2G 242 96 22458 10 8826 2 1854 98 120985 11 317.1 3 Latency 211ms 896ms 720ms 22258us 18733us 622ms Version 1.96 ------Sequential Create------ --------Random Create-------- debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 12857 33 +++++ +++ 15377 34 13585 33 +++++ +++ 15404 35 Latency 12284us 992us 1029us 432us 140us 76us ext3 (-T small,defaults,noatime): Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP debian 2G 229 98 4989 5 3862 1 1762 97 91111 9 266.6 6 Latency 79046us 22858ms 2577ms 19253us 12120us 767ms Version 1.96 ------Sequential Create------ --------Random Create-------- debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 6422 16 +++++ +++ 10319 25 8934 21 +++++ +++ 10347 26 Latency 9968us 977us 964us 482us 144us 178us ext3 (-T news,defaults,noatime): Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP debian 2G 237 95 22807 11 8807 2 1897 99 121893 11 324.6 5 Latency 223ms 808ms 523ms 13765us 11049us 831ms Version 1.96 ------Sequential Create------ --------Random Create-------- debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 12826 33 +++++ +++ 15900 35 14548 36 +++++ +++ 15460 35 Latency 417us 984us 1024us 430us 140us 175us ext4 (defaults,noatime): Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP debian 2G 256 98 21495 6 9896 2 1771 99 125775 11 349.7 5 Latency 37738us 992ms 3490ms 10811us 12045us 495ms Version 1.96 ------Sequential Create------ --------Random Create-------- debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 14766 43 +++++ +++ 18026 46 16094 46 +++++ +++ 17428 45 Latency 424us 982us 1023us 367us 139us 174us xfs(defaults,noatime,logbufs=8,logbsize=131072): Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP debian 2G 476 96 35129 9 12524 3 1417 99 124716 12 445.9 9 Latency 19798us 420ms 721ms 14122us 9394us 131ms Version 1.96 ------Sequential Create------ --------Random Create-------- debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 1552 8 +++++ +++ 1705 11 1675 9 +++++ +++ 1346 8 Latency 104ms 291us 48604us 109ms 45us 227ms It seems that latency is big in whole results, best is for XFS. It is tempting me to think that there are some RAID 5 issues here. It's really strange that block writing for SCSI server disks in RAID5 is no more than 30MB/sec(XFS). I guess I should consider XFS file system or different RAID configuration. May be someone can comment this strange benchmark result? Will very appreciate that. With regards, Andrey. 23.09.2011 11:31, Janne Pikkarainen ?????: > Hello, > > On 09/23/2011 08:51 AM, Andrey wrote: >> Hello, >> >> I have a production mail server with maildir++ structure and about >> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's >> mounted with noatime option. These mail server is responsible to local >> delivery and storing mail messages. >> >> System has Debian Squeeze installed and Exim as MDA + Dovecot as >> IMAP+POP3 server. >> >> Bonnie results are terrible. Sequential output for Block and Rewrite >> are 10722ms and 9232ms. So if there is a 1000 messages in the mail >> queue load is extremely high, delivery time is very big and server can >> hang. I did not see such problems with UFS on FreeBSD server. >> >> As I understand ext3 file system is really bad for such configurations >> with Maildir++ (many smaill files)? Is there a way to decrease disk >> latency on ext3 or speed up it? >> >> With regards, Andrey >> >> ___ > From sandeen at redhat.com Thu Sep 29 14:44:45 2011 From: sandeen at redhat.com (Eric Sandeen) Date: Thu, 29 Sep 2011 09:44:45 -0500 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E848256.4090106@bk.ru> References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi> <4E848256.4090106@bk.ru> Message-ID: <4E84845D.1040100@redhat.com> On 9/29/11 9:36 AM, Andrey wrote: > Let me to share some testing RAID 5 results with bonnie++: What kernel version was this tested on? Thanks, -Eric > ext3 (defaults,noatime): > > Version 1.96 ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > debian 2G 242 96 22458 10 8826 2 1854 98 120985 11 317.1 3 > Latency 211ms 896ms 720ms 22258us 18733us 622ms > Version 1.96 ------Sequential Create------ --------Random Create-------- > debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 12857 33 +++++ +++ 15377 34 13585 33 +++++ +++ 15404 35 > Latency 12284us 992us 1029us 432us 140us 76us > > ext3 (-T small,defaults,noatime): > > Version 1.96 ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > debian 2G 229 98 4989 5 3862 1 1762 97 91111 9 266.6 6 > Latency 79046us 22858ms 2577ms 19253us 12120us 767ms > Version 1.96 ------Sequential Create------ --------Random Create-------- > debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 6422 16 +++++ +++ 10319 25 8934 21 +++++ +++ 10347 26 > Latency 9968us 977us 964us 482us 144us 178us > > ext3 (-T news,defaults,noatime): > Version 1.96 ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > debian 2G 237 95 22807 11 8807 2 1897 99 121893 11 324.6 5 > Latency 223ms 808ms 523ms 13765us 11049us 831ms > Version 1.96 ------Sequential Create------ --------Random Create-------- > debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 12826 33 +++++ +++ 15900 35 14548 36 +++++ +++ 15460 35 > Latency 417us 984us 1024us 430us 140us 175us > > ext4 (defaults,noatime): > > Version 1.96 ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > debian 2G 256 98 21495 6 9896 2 1771 99 125775 11 349.7 5 > Latency 37738us 992ms 3490ms 10811us 12045us 495ms > Version 1.96 ------Sequential Create------ --------Random Create-------- > debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 14766 43 +++++ +++ 18026 46 16094 46 +++++ +++ 17428 45 > Latency 424us 982us 1023us 367us 139us 174us > > xfs(defaults,noatime,logbufs=8,logbsize=131072): > > Version 1.96 ------Sequential Output------ --Sequential Input- --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > debian 2G 476 96 35129 9 12524 3 1417 99 124716 12 445.9 9 > Latency 19798us 420ms 721ms 14122us 9394us 131ms > Version 1.96 ------Sequential Create------ --------Random Create-------- > debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 1552 8 +++++ +++ 1705 11 1675 9 +++++ +++ 1346 8 > Latency 104ms 291us 48604us 109ms 45us 227ms > > It seems that latency is big in whole results, best is for XFS. It is tempting me to think that there are some RAID 5 issues here. It's really strange that block writing for SCSI server disks in RAID5 is no more than 30MB/sec(XFS). I guess I should consider XFS file system or different RAID configuration. May be someone can comment this strange benchmark result? Will very appreciate that. > > With regards, Andrey. > > 23.09.2011 11:31, Janne Pikkarainen ?????: >> Hello, >> >> On 09/23/2011 08:51 AM, Andrey wrote: >>> Hello, >>> >>> I have a production mail server with maildir++ structure and about >>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's >>> mounted with noatime option. These mail server is responsible to local >>> delivery and storing mail messages. >>> >>> System has Debian Squeeze installed and Exim as MDA + Dovecot as >>> IMAP+POP3 server. >>> >>> Bonnie results are terrible. Sequential output for Block and Rewrite >>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail >>> queue load is extremely high, delivery time is very big and server can >>> hang. I did not see such problems with UFS on FreeBSD server. >>> >>> As I understand ext3 file system is really bad for such configurations >>> with Maildir++ (many smaill files)? Is there a way to decrease disk >>> latency on ext3 or speed up it? >>> >>> With regards, Andrey >>> >>> ___ >> > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From basketboy at bk.ru Thu Sep 29 15:13:48 2011 From: basketboy at bk.ru (Andrey) Date: Thu, 29 Sep 2011 19:13:48 +0400 Subject: ext3 with maildir++ = huge disk latency and high load In-Reply-To: <4E84845D.1040100@redhat.com> References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi> <4E848256.4090106@bk.ru> <4E84845D.1040100@redhat.com> Message-ID: <4E848B2C.6090209@bk.ru> Hello, This is standard Debian Squeeze(6.0.2) kernel: # uname -a Linux debian 2.6.32-5-686 #1 SMP Fri Sep 9 20:51:05 UTC 2011 i686 GNU/Linux 29.09.2011 18:44, Eric Sandeen ?????: > On 9/29/11 9:36 AM, Andrey wrote: >> Let me to share some testing RAID 5 results with bonnie++: > > What kernel version was this tested on? > > Thanks, > -Eric > >> ext3 (defaults,noatime): >> >> Version 1.96 ------Sequential Output------ --Sequential Input- --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP >> debian 2G 242 96 22458 10 8826 2 1854 98 120985 11 317.1 3 >> Latency 211ms 896ms 720ms 22258us 18733us 622ms >> Version 1.96 ------Sequential Create------ --------Random Create-------- >> debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> 16 12857 33 +++++ +++ 15377 34 13585 33 +++++ +++ 15404 35 >> Latency 12284us 992us 1029us 432us 140us 76us >> >> ext3 (-T small,defaults,noatime): >> >> Version 1.96 ------Sequential Output------ --Sequential Input- --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP >> debian 2G 229 98 4989 5 3862 1 1762 97 91111 9 266.6 6 >> Latency 79046us 22858ms 2577ms 19253us 12120us 767ms >> Version 1.96 ------Sequential Create------ --------Random Create-------- >> debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> 16 6422 16 +++++ +++ 10319 25 8934 21 +++++ +++ 10347 26 >> Latency 9968us 977us 964us 482us 144us 178us >> >> ext3 (-T news,defaults,noatime): >> Version 1.96 ------Sequential Output------ --Sequential Input- --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP >> debian 2G 237 95 22807 11 8807 2 1897 99 121893 11 324.6 5 >> Latency 223ms 808ms 523ms 13765us 11049us 831ms >> Version 1.96 ------Sequential Create------ --------Random Create-------- >> debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> 16 12826 33 +++++ +++ 15900 35 14548 36 +++++ +++ 15460 35 >> Latency 417us 984us 1024us 430us 140us 175us >> >> ext4 (defaults,noatime): >> >> Version 1.96 ------Sequential Output------ --Sequential Input- --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP >> debian 2G 256 98 21495 6 9896 2 1771 99 125775 11 349.7 5 >> Latency 37738us 992ms 3490ms 10811us 12045us 495ms >> Version 1.96 ------Sequential Create------ --------Random Create-------- >> debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> 16 14766 43 +++++ +++ 18026 46 16094 46 +++++ +++ 17428 45 >> Latency 424us 982us 1023us 367us 139us 174us >> >> xfs(defaults,noatime,logbufs=8,logbsize=131072): >> >> Version 1.96 ------Sequential Output------ --Sequential Input- --Random- >> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- >> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP >> debian 2G 476 96 35129 9 12524 3 1417 99 124716 12 445.9 9 >> Latency 19798us 420ms 721ms 14122us 9394us 131ms >> Version 1.96 ------Sequential Create------ --------Random Create-------- >> debian -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- >> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP >> 16 1552 8 +++++ +++ 1705 11 1675 9 +++++ +++ 1346 8 >> Latency 104ms 291us 48604us 109ms 45us 227ms >> >> It seems that latency is big in whole results, best is for XFS. It is tempting me to think that there are some RAID 5 issues here. It's really strange that block writing for SCSI server disks in RAID5 is no more than 30MB/sec(XFS). I guess I should consider XFS file system or different RAID configuration. May be someone can comment this strange benchmark result? Will very appreciate that. >> >> With regards, Andrey. >> >> 23.09.2011 11:31, Janne Pikkarainen ?????: >>> Hello, >>> >>> On 09/23/2011 08:51 AM, Andrey wrote: >>>> Hello, >>>> >>>> I have a production mail server with maildir++ structure and about >>>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's >>>> mounted with noatime option. These mail server is responsible to local >>>> delivery and storing mail messages. >>>> >>>> System has Debian Squeeze installed and Exim as MDA + Dovecot as >>>> IMAP+POP3 server. >>>> >>>> Bonnie results are terrible. Sequential output for Block and Rewrite >>>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail >>>> queue load is extremely high, delivery time is very big and server can >>>> hang. I did not see such problems with UFS on FreeBSD server. >>>> >>>> As I understand ext3 file system is really bad for such configurations >>>> with Maildir++ (many smaill files)? Is there a way to decrease disk >>>> latency on ext3 or speed up it? >>>> >>>> With regards, Andrey >>>> >>>> ___ >>> >> >> _______________________________________________ >> Ext3-users mailing list >> Ext3-users at redhat.com >> https://www.redhat.com/mailman/listinfo/ext3-users > >