From feldmann_markus at gmx.de  Fri Sep 16 13:24:15 2011
From: feldmann_markus at gmx.de (Markus Feldmann)
Date: Fri, 16 Sep 2011 15:24:15 +0200
Subject: damaged partition
Message-ID: <j4viqf$lc2$1@dough.gmane.org>

Hi All,

since weeks i cant access the laptop from my girl-friend. This laptop
has a dual boot system (Windows and Debian Lenny). When i try to start
the Linux System he stops at a specific point, where he cant find some
devices under /dev/. Further i tried to boot a live-CD and mount the
Linux System manualy without success.

My partition-table is:
     Disk /dev/sda: 80.0 GB, 80026361856 bytes
     255 heads, 63 sectors/track, 9729 cylinders
     Units = cylinders of 16065 * 512 = 8225280 bytes
     Sector size (logical/physical): 512 bytes / 512 bytes
     I/O size (minimum/optimal): 512 bytes / 512 bytes
     Disk identifier: 0xb1c0b1c0

        Device Boot      Start         End      Blocks   Id  System
     /dev/sda1   *           1        8032    64517008+   7  HPFS/NTFS
     /dev/sda2            8415        9728    10554705    f  W95 Ext'd (LBA)
     /dev/sda3            8033        8414     3068415    c  W95 FAT32 (LBA)
     /dev/sda5   *        8415        8420       48163+  83  Linux
     /dev/sda6            8421        8547     1020096   82  Linux swap
/ Solaris
     /dev/sda7            8548        9728     9486351   83  Linux

     Partition table entries are not in disk order

Here are some further tests:
<fsck /dev/sda7>
     fsck from util-linux-ng 2.17.2
     e2fsck 1.41.12 (17-May-2010)
     /dev/sda7: clean, 232529/1186688 files, 2143683/2371587 blocks

<mount /dev/sda7 /mnt/sda7>
     mount: Stale NFS file handle

<fsck -b 32768 /dev/sda7>
     fsck from util-linux-ng 2.17.2
     e2fsck 1.41.12 (17-May-2010)
     fsck.ext2: Bad magic number in super-block while trying to open
/dev/sda7

     The superblock could not be read or does not describe a correct ext2
     filesystem.  If the device is valid and it really contains an ext2
     filesystem (and not swap or ufs or something else), then the superblock
     is corrupt, and you might try running e2fsck with an alternate
superblock:
         e2fsck -b 8193<device>

I tried all backup-superblocks without success.

<file -s /dev/sda7>
/dev/sda7: Linux rev 1.0 ext2 filesystem data,
UUID=96380d35-74fd-4c37-abc9-9dfc3d1bd43e

<blkid | grep sda7>
/dev/sda7: UUID="96380d35-74fd-4c37-abc9-9dfc3d1bd43e" TYPE="ext2"

<fsck.ext2 /dev/sda7>
e2fsck 1.41.12 (17-May-2010)
/dev/sda7: clean, 232529/1186688 files, 2143683/2371587 blocks

<mount -t ext2 -o ro /dev/sda7 /mnt/sda7>
mount: Stale NFS file handle

I only need the data under .thunderbird/, but i need to mount the ext
partition first? or not? Is there another way to get my data?

regards Markus




From bothie at gmx.de  Fri Sep 16 15:41:36 2011
From: bothie at gmx.de (Bodo Thiesen)
Date: Fri, 16 Sep 2011 17:41:36 +0200
Subject: damaged partition
In-Reply-To: <j4viqf$lc2$1@dough.gmane.org>
References: <j4viqf$lc2$1@dough.gmane.org>
Message-ID: <20110916174136.6df111d2@gmx.de>

* Markus Feldmann <feldmann_markus at gmx.de> hat geschrieben:

> <fsck /dev/sda7>
>      fsck from util-linux-ng 2.17.2
>      e2fsck 1.41.12 (17-May-2010)
>      /dev/sda7: clean, 232529/1186688 files, 2143683/2371587 blocks

Hallo Markus

# e2fsck $dev

First tests, wether the file system state is ?clean? and if, wether the
maximal mount count or maximal mount time has reched. It tests the file
system only, if it's state is not ?clean? (on an ext2 with journal this
is usually never the case, because after journal replay the file system
will be clean) or any of the bothe maximal values has reached.

So, to test a file system which is marked clean, you have to force it:

# e2fsck -f $dev

BTW: And if you want some progress bar, add a -C 0:

# e2fsck -f -C 0 $dev

That's the command I usually run.

> <mount -t ext2 -o ro /dev/sda7 /mnt/sda7>
> mount: Stale NFS file handle

What's ?dmesg | tail -n 20? saying immediatelly after running that command?

What's the result of ?grep sda7 /proc/mounts??
What's the result of ?grep sda7 /etc/mtab??

Liebe Gr??e

Bodo



From feldmann_markus at gmx.de  Fri Sep 16 20:33:29 2011
From: feldmann_markus at gmx.de (Markus Feldmann)
Date: Fri, 16 Sep 2011 22:33:29 +0200
Subject: damaged partition
In-Reply-To: <20110916174136.6df111d2@gmx.de>
References: <j4viqf$lc2$1@dough.gmane.org> <20110916174136.6df111d2@gmx.de>
Message-ID: <j50bqb$8t8$1@dough.gmane.org>

Am 16.09.2011 17:41, schrieb Bodo Thiesen:
> So, to test a file system which is marked clean, you have to force it:
>
> # e2fsck -f $dev
>
Hi Bodo,

here the Result of
<e2fsck -f /dev/sda7>
     e2fsck 1.41.12 (17-May-2010)
     Resize_inode not enabled, but the resize inode is non-zero. 
Clear<y>? cancelled!


Should i go further? This will be overide some Bits. I am frightened. 
Does this mean somebody tried to resize this partion without success? 
Maybe I?

regards Markus



From samuel at bcgreen.com  Fri Sep 16 21:22:07 2011
From: samuel at bcgreen.com (Stephen Samuel)
Date: Fri, 16 Sep 2011 14:22:07 -0700
Subject: damaged partition
In-Reply-To: <j50bqb$8t8$1@dough.gmane.org>
References: <j4viqf$lc2$1@dough.gmane.org> <20110916174136.6df111d2@gmx.de>
	<j50bqb$8t8$1@dough.gmane.org>
Message-ID: <CALp1NBgmp-eKkTQLd4mv_JTsr_oeHJXeYJFnRO0FL2hExOoFfg@mail.gmail.com>

You're working with a damaged partition.. Probably the first thing to do
would be to make a copy of the partition.
Get either a 16GB thumb drive, or an external drive that you can partition
appropriately.. then make a copy of the damaged partition --  This may be a
trial-and-error situation.

Once you have a good copy, then you can work on the copy.

If you get a laptop drive, then you can make multiple copies of the bad
partition(s) overnight and then try different recovery paths until you get
what you need.

If you only care about parts of the data, on the drive you can also try
mounting readonly -- and see if the data you want is available for copying
out without having to repair the entire partition.

mount -o ro /dev/sda7 /mnt/sda7

On Fri, Sep 16, 2011 at 1:33 PM, Markus Feldmann <feldmann_markus at gmx.de>wrote:

> Am 16.09.2011 17:41, schrieb Bodo Thiesen:
>
>  So, to test a file system which is marked clean, you have to force it:
>>
>> # e2fsck -f $dev
>>
>>  Hi Bodo,
>
> here the Result of
> <e2fsck -f /dev/sda7>
>
>    e2fsck 1.41.12 (17-May-2010)
>    Resize_inode not enabled, but the resize inode is non-zero. Clear<y>?
> cancelled!
>
>
> Should i go further? This will be overide some Bits. I am frightened. Does
> this mean somebody tried to resize this partion without success? Maybe I?
>
> regards Markus
>
>
> ______________________________**_________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/**mailman/listinfo/ext3-users<https://www.redhat.com/mailman/listinfo/ext3-users>
>



-- 
Stephen Samuel http://www.bcgreen.com  Software, like love,
778-861-7641                              grows when you give it away
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20110916/ab6e2b8a/attachment.htm>

From feldmann_markus at gmx.de  Sat Sep 17 22:21:55 2011
From: feldmann_markus at gmx.de (Markus Feldmann)
Date: Sun, 18 Sep 2011 00:21:55 +0200
Subject: damaged partition
In-Reply-To: <CALp1NBgmp-eKkTQLd4mv_JTsr_oeHJXeYJFnRO0FL2hExOoFfg@mail.gmail.com>
References: <j4viqf$lc2$1@dough.gmane.org>
	<20110916174136.6df111d2@gmx.de>	<j50bqb$8t8$1@dough.gmane.org>
	<CALp1NBgmp-eKkTQLd4mv_JTsr_oeHJXeYJFnRO0FL2hExOoFfg@mail.gmail.com>
Message-ID: <j536hm$h2m$1@dough.gmane.org>

Am 16.09.2011 23:22, schrieb Stephen Samuel:
> You're working with a damaged partition.. Probably the first thing to do
> would be to make a copy of the partition.
> Get either a 16GB thumb drive, or an external drive that you can
> partition appropriately.. then make a copy of the damaged partition --
> This may be a trial-and-error situation.
>
> Once you have a good copy, then you can work on the copy.
>
> If you get a laptop drive, then you can make multiple copies of the bad
> partition(s) overnight and then try different recovery paths until you
> get what you need.
>
> If you only care about parts of the data, on the drive you can also try
> mounting readonly -- and see if the data you want is available for
> copying out without having to repair the entire partition.
>
> mount -o ro /dev/sda7 /mnt/sda7
Hi,

i bought an external 1GB Harddisk and saved this damaged partition. 
Furhter on i tried:
<e2fsck -f /mnt/AIRY-1TB/sda7.part>
http://pastebin.com/f1TRMNS2

The output is in german, sorry. However i dont know when i should to 
press "yes", so i only pressed "y".

Then i started <testdisk /mnt/AIRY-1TB/sda7.part> and he found ".", ".." 
and i copied "." to some place on my external disk. I could recover my 
.thunderbird :-)

regards Markus



From basketboy at bk.ru  Fri Sep 23 05:51:19 2011
From: basketboy at bk.ru (Andrey)
Date: Fri, 23 Sep 2011 09:51:19 +0400
Subject: ext3 with maildir++ = huge disk latency and high load
Message-ID: <4E7C1E57.9040904@bk.ru>

Hello,

I have a production mail server with maildir++ structure and about 250GB 
(~10 millions) of files on the ext3 partition on RAID5. It's mounted 
with noatime option. These mail server is responsible to local delivery 
and storing mail messages.

System has Debian Squeeze installed and Exim as MDA + Dovecot as 
IMAP+POP3 server.

Bonnie results are terrible. Sequential output for Block and Rewrite are 
10722ms and 9232ms. So if there is a 1000 messages in the mail queue 
load is extremely high, delivery time is very big and server can hang. I 
did not see such problems with UFS on FreeBSD server.

As I understand ext3 file system is really bad for such configurations 
with Maildir++ (many smaill files)? Is there a way to decrease disk 
latency on ext3 or speed up it?

With regards, Andrey



From basketboy at bk.ru  Fri Sep 23 08:14:57 2011
From: basketboy at bk.ru (Andrey)
Date: Fri, 23 Sep 2011 12:14:57 +0400
Subject: Fwd: Re: ext3 with maildir++ = huge disk latency and high load
Message-ID: <4E7C4001.2070003@bk.ru>

23.09.2011 11:31, Janne Pikkarainen ?????:
Thank you for reply,

BTW, other webserver has almost the same bonnie results (10283ms and
5884ms) on ext3 partition with 45GB of data (1.5 millions of files)?!

Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4
with SmartArray 6i controller (as I see it comes with 128MB BBWC enabler
but not kit).

I did not tried to mount fs with barriers disabled. Does it have any
crititcal risks?

Bonnie tests was performed in the morning when we have a mininmal user load.

With regards, Andrey.

> Hello,
>
> On 09/23/2011 08:51 AM, Andrey wrote:
>> Hello,
>>
>> I have a production mail server with maildir++ structure and about
>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's
>> mounted with noatime option. These mail server is responsible to local
>> delivery and storing mail messages.
>>
>> System has Debian Squeeze installed and Exim as MDA + Dovecot as
>> IMAP+POP3 server.
>>
>> Bonnie results are terrible. Sequential output for Block and Rewrite
>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail
>> queue load is extremely high, delivery time is very big and server can
>> hang. I did not see such problems with UFS on FreeBSD server.
>>
>> As I understand ext3 file system is really bad for such configurations
>> with Maildir++ (many smaill files)? Is there a way to decrease disk
>> latency on ext3 or speed up it?
>>
>> With regards, Andrey
>>
>> ___
>
> (replying off-list, so the ext3 developers will not start a flamewar)
>
> In my opinion ext3 is a terrible file system for your kind of workload,
> especially if you have lots of concurrent clients accessing their
> mailboxes. Even though ext3 has evolved over the years and has gained
> features such as directory indexes, it still is not good for tens of
> million of frequently changing small files with lots of concurrency.
> Been there, done that, not gonna do it again. I administer servers with
> 50 000 - 100 000 user accounts, with couple of thousands active IMAP
> connections.
>
> Personally I switched from ext3 to ReiserFS many years ago and happily
> used it between 2004-2008, then after things went downhill from ReiserFS
> development point of view, I switched to XFS during a server hardware
> refresh. ReiserFS was excellent, but it really started to slow down if
> file system was more than 85% full and it also got fragmented over time.
>
> XFS has been rock-solid and fast since 2008 for me, but it has an
> achilles heel of its own: if I need to remove lots of files from a huge
> directory tree, the delete performance is quite sucky compared to other
> file systems. This has been improved in the later kernel versions with
> the new delaylog parameter, but how much, I've not yet tested.
>
> All this said, the performance of ext3 should not be THAT bad you are
> describing. Is the bonnie result done while the server is idle or while
> it has mail clients accessing it all the time? If you have hardware
> RAID, is there a battery-backed up write cache and are you sure it's
> enabled? Also, have you tried to mount your file system with barriers
> disabled? What kind of server setup you have?
>
> Something is very wrong.
>
> Best regards,
>
> Janne Pikkarainen
>
>



From basketboy at bk.ru  Fri Sep 23 09:52:52 2011
From: basketboy at bk.ru (Andrey)
Date: Fri, 23 Sep 2011 13:52:52 +0400
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E7C35B8.70100@mikrobitti.fi>
References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi>
Message-ID: <4E7C56F4.6080900@bk.ru>

Thank you for reply,

BTW, other webserver has almost the same bonnie results (10283ms and 
5884ms) on ext3 partition with 45GB of data (1.5 millions of files)?!

Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4 
with SmartArray 6i controller (as I see it comes with 128MB BBWC enabler 
but not kit).

I did not tried to mount fs with barriers disabled. Does it have any 
crititcal risks?

Bonnie tests was performed in the morning when we have a mininmal user load.

But why the same server with the same RAID(4 disks) but with FreeBSD+UFS 
was much better? I guess problem is in ext3 then?

With regards, Andrey.

23.09.2011 11:31, Janne Pikkarainen ?????:
> Hello,
>
> On 09/23/2011 08:51 AM, Andrey wrote:
>> Hello,
>>
>> I have a production mail server with maildir++ structure and about
>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's
>> mounted with noatime option. These mail server is responsible to local
>> delivery and storing mail messages.
>>
>> System has Debian Squeeze installed and Exim as MDA + Dovecot as
>> IMAP+POP3 server.
>>
>> Bonnie results are terrible. Sequential output for Block and Rewrite
>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail
>> queue load is extremely high, delivery time is very big and server can
>> hang. I did not see such problems with UFS on FreeBSD server.
>>
>> As I understand ext3 file system is really bad for such configurations
>> with Maildir++ (many smaill files)? Is there a way to decrease disk
>> latency on ext3 or speed up it?
>>
>> With regards, Andrey
>>
>> ___
>
> (replying off-list, so the ext3 developers will not start a flamewar)
>
> In my opinion ext3 is a terrible file system for your kind of workload,
> especially if you have lots of concurrent clients accessing their
> mailboxes. Even though ext3 has evolved over the years and has gained
> features such as directory indexes, it still is not good for tens of
> million of frequently changing small files with lots of concurrency.
> Been there, done that, not gonna do it again. I administer servers with
> 50 000 - 100 000 user accounts, with couple of thousands active IMAP
> connections.
>
> Personally I switched from ext3 to ReiserFS many years ago and happily
> used it between 2004-2008, then after things went downhill from ReiserFS
> development point of view, I switched to XFS during a server hardware
> refresh. ReiserFS was excellent, but it really started to slow down if
> file system was more than 85% full and it also got fragmented over time.
>
> XFS has been rock-solid and fast since 2008 for me, but it has an
> achilles heel of its own: if I need to remove lots of files from a huge
> directory tree, the delete performance is quite sucky compared to other
> file systems. This has been improved in the later kernel versions with
> the new delaylog parameter, but how much, I've not yet tested.
>
> All this said, the performance of ext3 should not be THAT bad you are
> describing. Is the bonnie result done while the server is idle or while
> it has mail clients accessing it all the time? If you have hardware
> RAID, is there a battery-backed up write cache and are you sure it's
> enabled? Also, have you tried to mount your file system with barriers
> disabled? What kind of server setup you have?
>
> Something is very wrong.
>
> Best regards,
>
> Janne Pikkarainen
>
>



From sandeen at redhat.com  Fri Sep 23 14:43:53 2011
From: sandeen at redhat.com (Eric Sandeen)
Date: Fri, 23 Sep 2011 09:43:53 -0500
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E7C56F4.6080900@bk.ru>
References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi>
	<4E7C56F4.6080900@bk.ru>
Message-ID: <4E7C9B29.3080508@redhat.com>

On 9/23/11 4:52 AM, Andrey wrote:
> Thank you for reply,
> 
> BTW, other webserver has almost the same bonnie results (10283ms and
> 5884ms) on ext3 partition with 45GB of data (1.5 millions of
> files)?!
> 
> Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4
> with SmartArray 6i controller (as I see it comes with 128MB BBWC
> enabler but not kit).
> 
> I did not tried to mount fs with barriers disabled. Does it have any
> crititcal risks?

Yes.  If you have write caches on either the raid controller or on
the disks behind it which can be lost on a power outage, running
without barriers will potentially corrupt your filesystem if you lose
power, even though you have ext3's journaling.

Journaling depends on write guarantees which are lost if drive
write caches evaporate.

-Eric

> Bonnie tests was performed in the morning when we have a mininmal user load.
> 
> But why the same server with the same RAID(4 disks) but with FreeBSD+UFS was much better? I guess problem is in ext3 then?
> 
> With regards, Andrey.
> 
> 23.09.2011 11:31, Janne Pikkarainen ?????:
>> Hello,
>>
>> On 09/23/2011 08:51 AM, Andrey wrote:
>>> Hello,
>>>
>>> I have a production mail server with maildir++ structure and about
>>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's
>>> mounted with noatime option. These mail server is responsible to local
>>> delivery and storing mail messages.
>>>
>>> System has Debian Squeeze installed and Exim as MDA + Dovecot as
>>> IMAP+POP3 server.
>>>
>>> Bonnie results are terrible. Sequential output for Block and Rewrite
>>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail
>>> queue load is extremely high, delivery time is very big and server can
>>> hang. I did not see such problems with UFS on FreeBSD server.
>>>
>>> As I understand ext3 file system is really bad for such configurations
>>> with Maildir++ (many smaill files)? Is there a way to decrease disk
>>> latency on ext3 or speed up it?
>>>
>>> With regards, Andrey
>>>
>>> ___
>>
>> (replying off-list, so the ext3 developers will not start a flamewar)
>>
>> In my opinion ext3 is a terrible file system for your kind of workload,
>> especially if you have lots of concurrent clients accessing their
>> mailboxes. Even though ext3 has evolved over the years and has gained
>> features such as directory indexes, it still is not good for tens of
>> million of frequently changing small files with lots of concurrency.
>> Been there, done that, not gonna do it again. I administer servers with
>> 50 000 - 100 000 user accounts, with couple of thousands active IMAP
>> connections.
>>
>> Personally I switched from ext3 to ReiserFS many years ago and happily
>> used it between 2004-2008, then after things went downhill from ReiserFS
>> development point of view, I switched to XFS during a server hardware
>> refresh. ReiserFS was excellent, but it really started to slow down if
>> file system was more than 85% full and it also got fragmented over time.
>>
>> XFS has been rock-solid and fast since 2008 for me, but it has an
>> achilles heel of its own: if I need to remove lots of files from a huge
>> directory tree, the delete performance is quite sucky compared to other
>> file systems. This has been improved in the later kernel versions with
>> the new delaylog parameter, but how much, I've not yet tested.
>>
>> All this said, the performance of ext3 should not be THAT bad you are
>> describing. Is the bonnie result done while the server is idle or while
>> it has mail clients accessing it all the time? If you have hardware
>> RAID, is there a battery-backed up write cache and are you sure it's
>> enabled? Also, have you tried to mount your file system with barriers
>> disabled? What kind of server setup you have?
>>
>> Something is very wrong.
>>
>> Best regards,
>>
>> Janne Pikkarainen
>>
>>
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users



From sandeen at redhat.com  Fri Sep 23 14:48:39 2011
From: sandeen at redhat.com (Eric Sandeen)
Date: Fri, 23 Sep 2011 09:48:39 -0500
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E7C9B29.3080508@redhat.com>
References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi>
	<4E7C56F4.6080900@bk.ru> <4E7C9B29.3080508@redhat.com>
Message-ID: <4E7C9C47.7070907@redhat.com>

On 9/23/11 9:43 AM, Eric Sandeen wrote:
> On 9/23/11 4:52 AM, Andrey wrote:
>> Thank you for reply,
>>
>> BTW, other webserver has almost the same bonnie results (10283ms and
>> 5884ms) on ext3 partition with 45GB of data (1.5 millions of
>> files)?!
>>
>> Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4
>> with SmartArray 6i controller (as I see it comes with 128MB BBWC
>> enabler but not kit).
>>
>> I did not tried to mount fs with barriers disabled. Does it have any
>> crititcal risks?
> 
> Yes.  If you have write caches on either the raid controller or on
> the disks behind it which can be lost on a power outage, running
> without barriers will potentially corrupt your filesystem if you lose
> power, even though you have ext3's journaling.
> 
> Journaling depends on write guarantees which are lost if drive
> write caches evaporate.

... evaporate unexpectedly that is.  barriers manage that cache.

If write caches are battery-backed (or off), then nobarrier is safe.

-Eric
 
> -Eric




From kwijibo at zianet.com  Fri Sep 23 17:19:30 2011
From: kwijibo at zianet.com (Bob)
Date: Fri, 23 Sep 2011 11:19:30 -0600
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E7C1E57.9040904@bk.ru>
References: <4E7C1E57.9040904@bk.ru>
Message-ID: <4E7CBFA2.8000300@zianet.com>

On 09/22/2011 11:51 PM, Andrey wrote:
> Hello,
>
> I have a production mail server with maildir++ structure and about 
> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's 
> mounted with noatime option. These mail server is responsible to local 
> delivery and storing mail messages.
>
> System has Debian Squeeze installed and Exim as MDA + Dovecot as 
> IMAP+POP3 server.
>
> Bonnie results are terrible. Sequential output for Block and Rewrite 
> are 10722ms and 9232ms. So if there is a 1000 messages in the mail 
> queue load is extremely high, delivery time is very big and server can 
> hang. I did not see such problems with UFS on FreeBSD server.
>
> As I understand ext3 file system is really bad for such configurations 
> with Maildir++ (many smaill files)? Is there a way to decrease disk 
> latency on ext3 or speed up it?
>

My guess is that your problem is many files in one directory not necessarily
having many files on the whole file system.  In my experience large 
directories
eat ext3's lunch.  The introduction of indexing did help but it still 
fell behind
on performance when compared to some other file systems.  You may want
to make sure your file system has indexing turned on but with the vintage of
your Debian I would assume it is on by default.

I ran into this problem many years ago (before indexing was an ext3 
option).  It
was even worse as the Maildir storage was being accessed over NFS.  Ended
up eventually biting the bullet and moving to WAFL (NetApp).

My guess is that users trying to access these large directories via IMAP 
and POP
are also facing large delays and possibly even time outs.

Steven



From basketboy at bk.ru  Sat Sep 24 17:46:49 2011
From: basketboy at bk.ru (Andrey)
Date: Sat, 24 Sep 2011 21:46:49 +0400
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E7CBFA2.8000300@zianet.com>
References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com>
Message-ID: <4E7E1789.8090801@bk.ru>

Sure, indexing is on by default on Debian ext3. I think I'll try to test 
some cases an run bonnie++ on freesh HP server with the same configuration.

Also I have maildir with more than 10000 messages an don't have 
timesouts and access problesm via IMAP to it, that's strange. Sometimes 
I notice that copying message to Sent folder can wait a little but it's 
a seldom issue but can corellate with it, I agree. Also I see in Exim 
logs that DT (delivery time) is equal to more than 2 seconds although 
user's maildir is almost empty, so I intend to that it is a primary 
problem of whole ext3 system or RAID5 hardware.

23.09.2011 21:19, Bob ?????:
> On 09/22/2011 11:51 PM, Andrey wrote:
>> Hello,
>>
>> I have a production mail server with maildir++ structure and about
>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's
>> mounted with noatime option. These mail server is responsible to local
>> delivery and storing mail messages.
>>
>> System has Debian Squeeze installed and Exim as MDA + Dovecot as
>> IMAP+POP3 server.
>>
>> Bonnie results are terrible. Sequential output for Block and Rewrite
>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail
>> queue load is extremely high, delivery time is very big and server can
>> hang. I did not see such problems with UFS on FreeBSD server.
>>
>> As I understand ext3 file system is really bad for such configurations
>> with Maildir++ (many smaill files)? Is there a way to decrease disk
>> latency on ext3 or speed up it?
>>
>
> My guess is that your problem is many files in one directory not
> necessarily
> having many files on the whole file system. In my experience large
> directories
> eat ext3's lunch. The introduction of indexing did help but it still
> fell behind
> on performance when compared to some other file systems. You may want
> to make sure your file system has indexing turned on but with the
> vintage of
> your Debian I would assume it is on by default.
>
> I ran into this problem many years ago (before indexing was an ext3
> option). It
> was even worse as the Maildir storage was being accessed over NFS. Ended
> up eventually biting the bullet and moving to WAFL (NetApp).
>
> My guess is that users trying to access these large directories via IMAP
> and POP
> are also facing large delays and possibly even time outs.
>
> Steven
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>
>



From tytso at mit.edu  Sat Sep 24 19:04:47 2011
From: tytso at mit.edu (Ted Ts'o)
Date: Sat, 24 Sep 2011 15:04:47 -0400
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E7E1789.8090801@bk.ru>
References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com>
	<4E7E1789.8090801@bk.ru>
Message-ID: <20110924190447.GH2779@thunk.org>

On Sat, Sep 24, 2011 at 09:46:49PM +0400, Andrey wrote:
> Sure, indexing is on by default on Debian ext3. I think I'll try to
> test some cases an run bonnie++ on freesh HP server with the same
> configuration.

For really gargantuan directories, indexing definitely hurts when you
do a readdir+stat (i.e. /bin/ls -sF) or readdir+unlink (i.e., rm -rf)/

> Also I have maildir with more than 10000 messages an don't have
> timesouts and access problesm via IMAP to it, that's strange.

That's probably because this problem can be worked around by doing a
readdir, then sorting by the inode number (d_ino), and the doing the
stat or unlink.  Some programs, especially those that expressly deal
with Maildir directories, have this optimization already there.

I also have a LD_PRELOAD hack that can be used to demonstrate why
putting this is a good idea.  You can google for spd_readdir and find
it.  I'll also put the latest version of it in the contrib directory
in e2fsprogs for the next release.

    	     		       	    		 	 - Ted



From tytso at mit.edu  Sun Sep 25 01:41:34 2011
From: tytso at mit.edu (Ted Ts'o)
Date: Sat, 24 Sep 2011 21:41:34 -0400
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <20110924190447.GH2779@thunk.org>
References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com>
	<4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org>
Message-ID: <20110925014134.GC2606@thunk.org>

On Sat, Sep 24, 2011 at 03:04:47PM -0400, Ted Ts'o wrote:
> I also have a LD_PRELOAD hack that can be used to demonstrate why
> putting this is a good idea.  You can google for spd_readdir and find
> it.  I'll also put the latest version of it in the contrib directory
> in e2fsprogs for the next release.

While I was looking at spd_readdir.c before including it in
e2fsprogs's contrib directory, I realized the last version I released
was pretty incomplete, and didn't work with modern-day coreutils.

So I'll be including this version into the e2fsprogs git tree, but
since in the past I've distributing by sending it to folks via e-mail,
here's an updated version of spd_readdir.c.  Please try this to any
older versions that you might find in mailing list archives.

Note that this preload is not going to work for all programs.  In
particular, although it does supply readdir_r(), it is *not* thread
safe.  So I can't recommend this as something to be dropped in
/etc/ld.so.preload.

	      	  	      	      	      - Ted

/*
 * readdir accelerator
 *
 * (C) Copyright 2003, 2004 by Theodore Ts'o.
 *
 * Compile using the command:
 *
 * gcc -o spd_readdir.so -fPIC -shared spd_readdir.c -ldl
 *
 * Use it by setting the LD_PRELOAD environment variable:
 * 
 * export LD_PRELOAD=/usr/local/sbin/spd_readdir.so
 *
 * %Begin-Header%
 * This file may be redistributed under the terms of the GNU Public
 * License, version 2.
 * %End-Header%
 * 
 */

#define ALLOC_STEPSIZE	100
#define MAX_DIRSIZE	0

#define DEBUG

#ifdef DEBUG
#define DEBUG_DIR(x)	{if (do_debug) { x; }}
#else
#define DEBUG_DIR(x)
#endif

#define _GNU_SOURCE
#define __USE_LARGEFILE64

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>
#include <errno.h>
#include <dlfcn.h>

struct dirent_s {
	unsigned long long d_ino;
	long long d_off;
	unsigned short int d_reclen;
	unsigned char d_type;
	char *d_name;
};

struct dir_s {
	DIR	*dir;
	int	num;
	int	max;
	struct dirent_s *dp;
	int	pos;
	int	direct;
	struct dirent ret_dir;
	struct dirent64 ret_dir64;
};

static int (*real_closedir)(DIR *dir) = 0;
static DIR *(*real_opendir)(const char *name) = 0;
static DIR *(*real_fdopendir)(int fd) = 0;
static void *(*real_rewinddir)(DIR *dirp) = 0;
static struct dirent *(*real_readdir)(DIR *dir) = 0;
static int (*real_readdir_r)(DIR *dir, struct dirent *entry,
			     struct dirent **result) = 0;
static struct dirent64 *(*real_readdir64)(DIR *dir) = 0;
static off_t (*real_telldir)(DIR *dir) = 0;
static void (*real_seekdir)(DIR *dir, off_t offset) = 0;
static int (*real_dirfd)(DIR *dir) = 0;
static unsigned long max_dirsize = MAX_DIRSIZE;
static int num_open = 0;
#ifdef DEBUG
static int do_debug = 0;
#endif

static void setup_ptr()
{
	char *cp;

	real_opendir = dlsym(RTLD_NEXT, "opendir");
	real_fdopendir = dlsym(RTLD_NEXT, "fdopendir");
	real_closedir = dlsym(RTLD_NEXT, "closedir");
	real_rewinddir = dlsym(RTLD_NEXT, "rewinddir");
	real_readdir = dlsym(RTLD_NEXT, "readdir");
	real_readdir_r = dlsym(RTLD_NEXT, "readdir_r");
	real_readdir64 = dlsym(RTLD_NEXT, "readdir64");
	real_telldir = dlsym(RTLD_NEXT, "telldir");
	real_seekdir = dlsym(RTLD_NEXT, "seekdir");
	real_dirfd = dlsym(RTLD_NEXT, "dirfd");
	if ((cp = getenv("SPD_READDIR_MAX_SIZE")) != NULL) {
		max_dirsize = atol(cp);
	}
#ifdef DEBUG
	if (getenv("SPD_READDIR_DEBUG")) {
		printf("initialized!\n");
		do_debug++;
	}
#endif
}

static void free_cached_dir(struct dir_s *dirstruct)
{
	int i;

	if (!dirstruct->dp)
		return;

	for (i=0; i < dirstruct->num; i++) {
		free(dirstruct->dp[i].d_name);
	}
	free(dirstruct->dp);
	dirstruct->dp = 0;
	dirstruct->max = dirstruct->num = 0;
}	

static int ino_cmp(const void *a, const void *b)
{
	const struct dirent_s *ds_a = (const struct dirent_s *) a;
	const struct dirent_s *ds_b = (const struct dirent_s *) b;
	ino_t i_a, i_b;
	
	i_a = ds_a->d_ino;
	i_b = ds_b->d_ino;

	if (ds_a->d_name[0] == '.') {
		if (ds_a->d_name[1] == 0)
			i_a = 0;
		else if ((ds_a->d_name[1] == '.') && (ds_a->d_name[2] == 0))
			i_a = 1;
	}
	if (ds_b->d_name[0] == '.') {
		if (ds_b->d_name[1] == 0)
			i_b = 0;
		else if ((ds_b->d_name[1] == '.') && (ds_b->d_name[2] == 0))
			i_b = 1;
	}

	return (i_a - i_b);
}

struct dir_s *alloc_dirstruct(DIR *dir)
{
	struct dir_s	*dirstruct;

	dirstruct = malloc(sizeof(struct dir_s));
	if (dirstruct)
		memset(dirstruct, 0, sizeof(struct dir_s));
	dirstruct->dir = dir;
	return dirstruct;
}

void cache_dirstruct(struct dir_s *dirstruct)
{
	struct dirent_s *ds, *dnew;
	struct dirent64 *d;

	while ((d = (*real_readdir64)(dirstruct->dir)) != NULL) {
		if (dirstruct->num >= dirstruct->max) {
			dirstruct->max += ALLOC_STEPSIZE;
			DEBUG_DIR(printf("Reallocating to size %d\n", 
					 dirstruct->max));
			dnew = realloc(dirstruct->dp, 
				       dirstruct->max * sizeof(struct dir_s));
			if (!dnew)
				goto nomem;
			dirstruct->dp = dnew;
		}
		ds = &dirstruct->dp[dirstruct->num++];
		ds->d_ino = d->d_ino;
		ds->d_off = d->d_off;
		ds->d_reclen = d->d_reclen;
		ds->d_type = d->d_type;
		if ((ds->d_name = malloc(strlen(d->d_name)+1)) == NULL) {
			dirstruct->num--;
			goto nomem;
		}
		strcpy(ds->d_name, d->d_name);
		DEBUG_DIR(printf("readdir: %lu %s\n", 
				 (unsigned long) d->d_ino, d->d_name));
	}
	qsort(dirstruct->dp, dirstruct->num, sizeof(struct dirent_s), ino_cmp);
	return;
nomem:
	DEBUG_DIR(printf("No memory, backing off to direct readdir\n"));
	free_cached_dir(dirstruct);
	dirstruct->direct = 1;
}

DIR *opendir(const char *name)
{
	DIR *dir;
	struct dir_s	*dirstruct;
	struct stat st;

	if (!real_opendir)
		setup_ptr();

	DEBUG_DIR(printf("Opendir(%s) (%d open)\n", name, num_open++));
	dir = (*real_opendir)(name);
	if (!dir)
		return NULL;

	dirstruct = alloc_dirstruct(dir);
	if (!dirstruct) {
		(*real_closedir)(dir);
		errno = -ENOMEM;
		return NULL;
	}

	if (max_dirsize && (stat(name, &st) == 0) && 
	    (st.st_size > max_dirsize)) {
		DEBUG_DIR(printf("Directory size %ld, using direct readdir\n",
				 st.st_size));
		dirstruct->direct = 1;
		return (DIR *) dirstruct;
	}

	cache_dirstruct(dirstruct);
	return ((DIR *) dirstruct);
}

DIR *fdopendir(int fd)
{
	DIR *dir;
	struct dir_s	*dirstruct;
	struct stat st;

	if (!real_fdopendir)
		setup_ptr();

	DEBUG_DIR(printf("fdpendir(%d) (%d open)\n", fd, num_open++));
	dir = (*real_fdopendir)(fd);
	if (!dir)
		return NULL;

	dirstruct = alloc_dirstruct(dir);
	if (!dirstruct) {
		(*real_closedir)(dir);
		errno = -ENOMEM;
		return NULL;
	}

	if (max_dirsize && (fstat(fd, &st) == 0) && 
	    (st.st_size > max_dirsize)) {
		DEBUG_DIR(printf("Directory size %ld, using direct readdir\n",
				 st.st_size));
		dirstruct->dir = dir;
		dirstruct->direct = 1;
		return (DIR *) dirstruct;
	}

	cache_dirstruct(dirstruct);
	return ((DIR *) dirstruct);
}

int closedir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	DEBUG_DIR(printf("Closedir (%d open)\n", --num_open));
	if (dirstruct->dir)
		(*real_closedir)(dirstruct->dir);

	free_cached_dir(dirstruct);
	free(dirstruct);
	return 0;
}

struct dirent *readdir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;
	struct dirent_s *ds;

	if (dirstruct->direct)
		return (*real_readdir)(dirstruct->dir);

	if (dirstruct->pos >= dirstruct->num)
		return NULL;

	ds = &dirstruct->dp[dirstruct->pos++];
	dirstruct->ret_dir.d_ino = ds->d_ino;
	dirstruct->ret_dir.d_off = ds->d_off;
	dirstruct->ret_dir.d_reclen = ds->d_reclen;
	dirstruct->ret_dir.d_type = ds->d_type;
	strncpy(dirstruct->ret_dir.d_name, ds->d_name,
		sizeof(dirstruct->ret_dir.d_name));

	return (&dirstruct->ret_dir);
}

int readdir_r(DIR *dir, struct dirent *entry, struct dirent **result)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;
	struct dirent_s *ds;

	if (dirstruct->direct)
		return (*real_readdir_r)(dirstruct->dir, entry, result);

	if (dirstruct->pos >= dirstruct->num) {
		*result = NULL;
		return 0;
	}

	ds = &dirstruct->dp[dirstruct->pos++];
	entry->d_ino = ds->d_ino;
	entry->d_off = ds->d_off;
	entry->d_reclen = ds->d_reclen;
	entry->d_type = ds->d_type;
	strncpy(entry->d_name, ds->d_name, sizeof(entry->d_name));
	*result = entry;

	return 0;
}

struct dirent64 *readdir64(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;
	struct dirent_s *ds;

	if (dirstruct->direct)
		return (*real_readdir64)(dirstruct->dir);

	if (dirstruct->pos >= dirstruct->num)
		return NULL;

	ds = &dirstruct->dp[dirstruct->pos++];
	dirstruct->ret_dir64.d_ino = ds->d_ino;
	dirstruct->ret_dir64.d_off = ds->d_off;
	dirstruct->ret_dir64.d_reclen = ds->d_reclen;
	dirstruct->ret_dir64.d_type = ds->d_type;
	strncpy(dirstruct->ret_dir64.d_name, ds->d_name,
		sizeof(dirstruct->ret_dir64.d_name));

	return (&dirstruct->ret_dir64);
}

off_t telldir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	if (dirstruct->direct)
		return (*real_telldir)(dirstruct->dir);

	return ((off_t) dirstruct->pos);
}

void seekdir(DIR *dir, off_t offset)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	if (dirstruct->direct) {
		(*real_seekdir)(dirstruct->dir, offset);
		return;
	}

	dirstruct->pos = offset;
}

void rewinddir(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;

	(*real_rewinddir)(dirstruct->dir);
	if (dirstruct->direct)
		return;
	
	dirstruct->pos = 0;
	free_cached_dir(dirstruct);
	cache_dirstruct(dirstruct);
}

int dirfd(DIR *dir)
{
	struct dir_s	*dirstruct = (struct dir_s *) dir;
	int fd = (*real_dirfd)(dirstruct->dir);

	DEBUG_DIR(printf("dirfd %d, %p\n", fd, real_dirfd));
	return fd;
}



From adilger at dilger.ca  Sun Sep 25 06:16:12 2011
From: adilger at dilger.ca (Andreas Dilger)
Date: Sun, 25 Sep 2011 00:16:12 -0600
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <20110925014134.GC2606@thunk.org>
References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com>
	<4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org>
	<20110925014134.GC2606@thunk.org>
Message-ID: <1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca>

On Sat, Sep 24, 2011 at 03:04:47PM -0400, Ted Ts'o wrote:
> I also have a LD_PRELOAD hack that can be used to demonstrate why
> putting this is a good idea.  You can google for spd_readdir and find
> it.  I'll also put the latest version of it in the contrib directory
> in e2fsprogs for the next release.


What we've started doing in Lustre (which has to deal with network
latency, but the same problem in terms of htree vs. inode ordering)
is to detect if the application is doing readdir+stat on the dirents
in readdir order, and then fork a thread to statahead the entries
in the kernel.

It would be possible to do something like this in the ext4 readdir
code to do dirent readahead, sort, and then prefetch the inodes
in order (partially or completely, depending on the directory size),
but as yet we aren't working on anything at the ext4 level.

There was a patch to do something similar to this for btrfs as well,
with the DCACHE_NEED_LOOKUP flag.  That avoids a lot of the complexity
between instantiating dcache entries from readdir without yet having
read the inode from disk.

The other proposal I've made in the past is to try and allocate inodes
from the inode table in roughly hash order, so that when it comes time
to do readdir+stat that the dirents and inodes are already partially in
the same order.  That breaks down in case of renames, but works well
for normal usage.

Cheers, Andreas







From tytso at mit.edu  Sun Sep 25 21:21:27 2011
From: tytso at mit.edu (Ted Ts'o)
Date: Sun, 25 Sep 2011 17:21:27 -0400
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca>
References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com>
	<4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org>
	<20110925014134.GC2606@thunk.org>
	<1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca>
Message-ID: <20110925212127.GD27089@thunk.org>

On Sun, Sep 25, 2011 at 12:16:12AM -0600, Andreas Dilger wrote:
> 
> It would be possible to do something like this in the ext4 readdir
> code to do dirent readahead, sort, and then prefetch the inodes
> in order (partially or completely, depending on the directory size),
> but as yet we aren't working on anything at the ext4 level.

What we have in ext4 right now is if we need to do disk i/o to read
from the inode table, we will read in adjacent blocks from the inode
table, on the theory that the effort needed to read in 32k versus 4k
is pretty much the same.  So if the inodes were allocated all at the
same time, they will be sequentially ordered, and so the inode table
readahead should help quite a lot.

I'll note that with really large maildirs, especially on a mail server
with many other maildirs, over time the inodes for each individual
file will get scattered all over the place, and so pretty much any
scheme that uses a inode table separate from the blocks where the
directory entries are stored is going to get hammered by this use
case.

Ultimately, the best way to solve this problem is a more intelligent
application that caches the contents of the key headers in a database,
so you don't need to scan the contents of the entire Maildir when
doing common IMAP operations.

						- Ted



From basketboy at bk.ru  Thu Sep 29 07:29:42 2011
From: basketboy at bk.ru (Andrey)
Date: Thu, 29 Sep 2011 11:29:42 +0400
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <20110925212127.GD27089@thunk.org>
References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com>
	<4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org>
	<20110925014134.GC2606@thunk.org>
	<1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca>
	<20110925212127.GD27089@thunk.org>
Message-ID: <4E841E66.1090600@bk.ru>

Ok. Here are bonnie results on fresh installed Debian with 200GB FREE 
ext3 /home partitition (4 disks in RAID5 on HP Proliant DL380 G4 server):

Version  1.96  ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
debian           2G   243  97 22555  10  8794   2  1810  97 120444  11 
317.0   5
Latency               135ms     967ms     723ms   26526us   13143us 
586ms

Latency is also very bad according results. What is the reason? Hardware 
or ext3 itseld? Will try with xfs an ext4 and compare then.


26.09.2011 01:21, Ted Ts'o ?????:
> On Sun, Sep 25, 2011 at 12:16:12AM -0600, Andreas Dilger wrote:
>>
>> It would be possible to do something like this in the ext4 readdir
>> code to do dirent readahead, sort, and then prefetch the inodes
>> in order (partially or completely, depending on the directory size),
>> but as yet we aren't working on anything at the ext4 level.
>
> What we have in ext4 right now is if we need to do disk i/o to read
> from the inode table, we will read in adjacent blocks from the inode
> table, on the theory that the effort needed to read in 32k versus 4k
> is pretty much the same.  So if the inodes were allocated all at the
> same time, they will be sequentially ordered, and so the inode table
> readahead should help quite a lot.
>
> I'll note that with really large maildirs, especially on a mail server
> with many other maildirs, over time the inodes for each individual
> file will get scattered all over the place, and so pretty much any
> scheme that uses a inode table separate from the blocks where the
> directory entries are stored is going to get hammered by this use
> case.
>
> Ultimately, the best way to solve this problem is a more intelligent
> application that caches the contents of the key headers in a database,
> so you don't need to scan the contents of the entire Maildir when
> doing common IMAP operations.
>
> 						- Ted
>
>



From tytso at mit.edu  Thu Sep 29 13:08:32 2011
From: tytso at mit.edu (Ted Ts'o)
Date: Thu, 29 Sep 2011 09:08:32 -0400
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E841E66.1090600@bk.ru>
References: <4E7C1E57.9040904@bk.ru> <4E7CBFA2.8000300@zianet.com>
	<4E7E1789.8090801@bk.ru> <20110924190447.GH2779@thunk.org>
	<20110925014134.GC2606@thunk.org>
	<1AE9D983-6DD1-4AC9-A8B0-76000C004B1F@dilger.ca>
	<20110925212127.GD27089@thunk.org> <4E841E66.1090600@bk.ru>
Message-ID: <20110929130832.GP19250@thunk.org>

On Thu, Sep 29, 2011 at 11:29:42AM +0400, Andrey wrote:
> Ok. Here are bonnie results on fresh installed Debian with 200GB
> FREE ext3 /home partitition (4 disks in RAID5 on HP Proliant DL380
> G4 server):
> 
> Version  1.96  ------Sequential Output------ --Sequential Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr-
> --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
> %CP /sec %CP
> debian           2G   243  97 22555  10  8794   2  1810  97 120444
> 11 317.0   5
> Latency               135ms     967ms     723ms   26526us   13143us
> 586ms
> 
> Latency is also very bad according results. What is the reason?
> Hardware or ext3 itseld? Will try with xfs an ext4 and compare then.

Ext4 will do better than ext3; but I can predict to you up front that
xfs will do better than ext4, because it is better optimized for RAID
arrays at the moment.  Ext4 has superblock fields to store RAID 5
parameters, but the code to fully take advantage of those RAID
parameters is not fully implemented.

You should also take a look at your RAID parameters.  If your RAID
stripe size is too large, it will impact workloads such as mail
servers which typically involve small writes.  Have you considered
using RAID 10 over RAID 5?  It's not as efficient from a space
perspective but if you are primarily concerned about throughput and
latency, it's the way to go.

						- Ted



From basketboy at bk.ru  Thu Sep 29 14:36:06 2011
From: basketboy at bk.ru (Andrey)
Date: Thu, 29 Sep 2011 18:36:06 +0400
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E7C35B8.70100@mikrobitti.fi>
References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi>
Message-ID: <4E848256.4090106@bk.ru>

Let me to share some testing RAID 5 results with bonnie++:

ext3 (defaults,noatime):

Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
debian           2G   242  96 22458  10  8826   2  1854  98 120985  11 
317.1   3
Latency               211ms     896ms     720ms   22258us   18733us 
622ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
debian              -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 12857  33 +++++ +++ 15377  34 13585  33 +++++ +++ 
15404  35
Latency             12284us     992us    1029us     432us     140us 
  76us

ext3 (-T small,defaults,noatime):

Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
debian           2G   229  98  4989   5  3862   1  1762  97 91111   9 
266.6   6
Latency             79046us   22858ms    2577ms   19253us   12120us 
767ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
debian              -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16  6422  16 +++++ +++ 10319  25  8934  21 +++++ +++ 
10347  26
Latency              9968us     977us     964us     482us     144us 
178us

ext3 (-T news,defaults,noatime):
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
debian           2G   237  95 22807  11  8807   2  1897  99 121893  11 
324.6   5
Latency               223ms     808ms     523ms   13765us   11049us 
831ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
debian              -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 12826  33 +++++ +++ 15900  35 14548  36 +++++ +++ 
15460  35
Latency               417us     984us    1024us     430us     140us 
175us

ext4 (defaults,noatime):

Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
debian           2G   256  98 21495   6  9896   2  1771  99 125775  11 
349.7   5
Latency             37738us     992ms    3490ms   10811us   12045us 
495ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
debian              -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16 14766  43 +++++ +++ 18026  46 16094  46 +++++ +++ 
17428  45
Latency               424us     982us    1023us     367us     139us 
174us

xfs(defaults,noatime,logbufs=8,logbsize=131072):

Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
debian           2G   476  96 35129   9 12524   3  1417  99 124716  12 
445.9   9
Latency             19798us     420ms     721ms   14122us    9394us 
131ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
debian              -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
                  16  1552   8 +++++ +++  1705  11  1675   9 +++++ +++ 
1346   8
Latency               104ms     291us   48604us     109ms      45us 
227ms

It seems that latency is big in whole results, best is for XFS. It is 
tempting me to think that there are some RAID 5 issues here. It's really 
strange that block writing for SCSI server disks in RAID5 is no more 
than 30MB/sec(XFS). I guess I should consider XFS file system or 
different RAID configuration. May be someone can comment this strange 
benchmark result? Will very appreciate that.

With regards, Andrey.

23.09.2011 11:31, Janne Pikkarainen ?????:
> Hello,
>
> On 09/23/2011 08:51 AM, Andrey wrote:
>> Hello,
>>
>> I have a production mail server with maildir++ structure and about
>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's
>> mounted with noatime option. These mail server is responsible to local
>> delivery and storing mail messages.
>>
>> System has Debian Squeeze installed and Exim as MDA + Dovecot as
>> IMAP+POP3 server.
>>
>> Bonnie results are terrible. Sequential output for Block and Rewrite
>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail
>> queue load is extremely high, delivery time is very big and server can
>> hang. I did not see such problems with UFS on FreeBSD server.
>>
>> As I understand ext3 file system is really bad for such configurations
>> with Maildir++ (many smaill files)? Is there a way to decrease disk
>> latency on ext3 or speed up it?
>>
>> With regards, Andrey
>>
>> ___
>



From sandeen at redhat.com  Thu Sep 29 14:44:45 2011
From: sandeen at redhat.com (Eric Sandeen)
Date: Thu, 29 Sep 2011 09:44:45 -0500
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E848256.4090106@bk.ru>
References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi>
	<4E848256.4090106@bk.ru>
Message-ID: <4E84845D.1040100@redhat.com>

On 9/29/11 9:36 AM, Andrey wrote:
> Let me to share some testing RAID 5 results with bonnie++:

What kernel version was this tested on?

Thanks,
-Eric

> ext3 (defaults,noatime):
> 
> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> debian           2G   242  96 22458  10  8826   2  1854  98 120985  11 317.1   3
> Latency               211ms     896ms     720ms   22258us   18733us 622ms
> Version  1.96       ------Sequential Create------ --------Random Create--------
> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>                  16 12857  33 +++++ +++ 15377  34 13585  33 +++++ +++ 15404  35
> Latency             12284us     992us    1029us     432us     140us  76us
> 
> ext3 (-T small,defaults,noatime):
> 
> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> debian           2G   229  98  4989   5  3862   1  1762  97 91111   9 266.6   6
> Latency             79046us   22858ms    2577ms   19253us   12120us 767ms
> Version  1.96       ------Sequential Create------ --------Random Create--------
> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>                  16  6422  16 +++++ +++ 10319  25  8934  21 +++++ +++ 10347  26
> Latency              9968us     977us     964us     482us     144us 178us
> 
> ext3 (-T news,defaults,noatime):
> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> debian           2G   237  95 22807  11  8807   2  1897  99 121893  11 324.6   5
> Latency               223ms     808ms     523ms   13765us   11049us 831ms
> Version  1.96       ------Sequential Create------ --------Random Create--------
> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>                  16 12826  33 +++++ +++ 15900  35 14548  36 +++++ +++ 15460  35
> Latency               417us     984us    1024us     430us     140us 175us
> 
> ext4 (defaults,noatime):
> 
> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> debian           2G   256  98 21495   6  9896   2  1771  99 125775  11 349.7   5
> Latency             37738us     992ms    3490ms   10811us   12045us 495ms
> Version  1.96       ------Sequential Create------ --------Random Create--------
> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>                  16 14766  43 +++++ +++ 18026  46 16094  46 +++++ +++ 17428  45
> Latency               424us     982us    1023us     367us     139us 174us
> 
> xfs(defaults,noatime,logbufs=8,logbsize=131072):
> 
> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> debian           2G   476  96 35129   9 12524   3  1417  99 124716  12 445.9   9
> Latency             19798us     420ms     721ms   14122us    9394us 131ms
> Version  1.96       ------Sequential Create------ --------Random Create--------
> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>                  16  1552   8 +++++ +++  1705  11  1675   9 +++++ +++ 1346   8
> Latency               104ms     291us   48604us     109ms      45us 227ms
> 
> It seems that latency is big in whole results, best is for XFS. It is tempting me to think that there are some RAID 5 issues here. It's really strange that block writing for SCSI server disks in RAID5 is no more than 30MB/sec(XFS). I guess I should consider XFS file system or different RAID configuration. May be someone can comment this strange benchmark result? Will very appreciate that.
> 
> With regards, Andrey.
> 
> 23.09.2011 11:31, Janne Pikkarainen ?????:
>> Hello,
>>
>> On 09/23/2011 08:51 AM, Andrey wrote:
>>> Hello,
>>>
>>> I have a production mail server with maildir++ structure and about
>>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's
>>> mounted with noatime option. These mail server is responsible to local
>>> delivery and storing mail messages.
>>>
>>> System has Debian Squeeze installed and Exim as MDA + Dovecot as
>>> IMAP+POP3 server.
>>>
>>> Bonnie results are terrible. Sequential output for Block and Rewrite
>>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail
>>> queue load is extremely high, delivery time is very big and server can
>>> hang. I did not see such problems with UFS on FreeBSD server.
>>>
>>> As I understand ext3 file system is really bad for such configurations
>>> with Maildir++ (many smaill files)? Is there a way to decrease disk
>>> latency on ext3 or speed up it?
>>>
>>> With regards, Andrey
>>>
>>> ___
>>
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users



From basketboy at bk.ru  Thu Sep 29 15:13:48 2011
From: basketboy at bk.ru (Andrey)
Date: Thu, 29 Sep 2011 19:13:48 +0400
Subject: ext3 with maildir++ = huge disk latency and high load
In-Reply-To: <4E84845D.1040100@redhat.com>
References: <4E7C1E57.9040904@bk.ru> <4E7C35B8.70100@mikrobitti.fi>
	<4E848256.4090106@bk.ru> <4E84845D.1040100@redhat.com>
Message-ID: <4E848B2C.6090209@bk.ru>

Hello,

This is standard Debian Squeeze(6.0.2) kernel:

# uname -a
Linux debian 2.6.32-5-686 #1 SMP Fri Sep 9 20:51:05 UTC 2011 i686 GNU/Linux

29.09.2011 18:44, Eric Sandeen ?????:
> On 9/29/11 9:36 AM, Andrey wrote:
>> Let me to share some testing RAID 5 results with bonnie++:
>
> What kernel version was this tested on?
>
> Thanks,
> -Eric
>
>> ext3 (defaults,noatime):
>>
>> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
>> debian           2G   242  96 22458  10  8826   2  1854  98 120985  11 317.1   3
>> Latency               211ms     896ms     720ms   22258us   18733us 622ms
>> Version  1.96       ------Sequential Create------ --------Random Create--------
>> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>>                   16 12857  33 +++++ +++ 15377  34 13585  33 +++++ +++ 15404  35
>> Latency             12284us     992us    1029us     432us     140us  76us
>>
>> ext3 (-T small,defaults,noatime):
>>
>> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
>> debian           2G   229  98  4989   5  3862   1  1762  97 91111   9 266.6   6
>> Latency             79046us   22858ms    2577ms   19253us   12120us 767ms
>> Version  1.96       ------Sequential Create------ --------Random Create--------
>> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>>                   16  6422  16 +++++ +++ 10319  25  8934  21 +++++ +++ 10347  26
>> Latency              9968us     977us     964us     482us     144us 178us
>>
>> ext3 (-T news,defaults,noatime):
>> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
>> debian           2G   237  95 22807  11  8807   2  1897  99 121893  11 324.6   5
>> Latency               223ms     808ms     523ms   13765us   11049us 831ms
>> Version  1.96       ------Sequential Create------ --------Random Create--------
>> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>>                   16 12826  33 +++++ +++ 15900  35 14548  36 +++++ +++ 15460  35
>> Latency               417us     984us    1024us     430us     140us 175us
>>
>> ext4 (defaults,noatime):
>>
>> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
>> debian           2G   256  98 21495   6  9896   2  1771  99 125775  11 349.7   5
>> Latency             37738us     992ms    3490ms   10811us   12045us 495ms
>> Version  1.96       ------Sequential Create------ --------Random Create--------
>> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>>                   16 14766  43 +++++ +++ 18026  46 16094  46 +++++ +++ 17428  45
>> Latency               424us     982us    1023us     367us     139us 174us
>>
>> xfs(defaults,noatime,logbufs=8,logbsize=131072):
>>
>> Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
>> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
>> debian           2G   476  96 35129   9 12524   3  1417  99 124716  12 445.9   9
>> Latency             19798us     420ms     721ms   14122us    9394us 131ms
>> Version  1.96       ------Sequential Create------ --------Random Create--------
>> debian              -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>>                files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP
>>                   16  1552   8 +++++ +++  1705  11  1675   9 +++++ +++ 1346   8
>> Latency               104ms     291us   48604us     109ms      45us 227ms
>>
>> It seems that latency is big in whole results, best is for XFS. It is tempting me to think that there are some RAID 5 issues here. It's really strange that block writing for SCSI server disks in RAID5 is no more than 30MB/sec(XFS). I guess I should consider XFS file system or different RAID configuration. May be someone can comment this strange benchmark result? Will very appreciate that.
>>
>> With regards, Andrey.
>>
>> 23.09.2011 11:31, Janne Pikkarainen ?????:
>>> Hello,
>>>
>>> On 09/23/2011 08:51 AM, Andrey wrote:
>>>> Hello,
>>>>
>>>> I have a production mail server with maildir++ structure and about
>>>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's
>>>> mounted with noatime option. These mail server is responsible to local
>>>> delivery and storing mail messages.
>>>>
>>>> System has Debian Squeeze installed and Exim as MDA + Dovecot as
>>>> IMAP+POP3 server.
>>>>
>>>> Bonnie results are terrible. Sequential output for Block and Rewrite
>>>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail
>>>> queue load is extremely high, delivery time is very big and server can
>>>> hang. I did not see such problems with UFS on FreeBSD server.
>>>>
>>>> As I understand ext3 file system is really bad for such configurations
>>>> with Maildir++ (many smaill files)? Is there a way to decrease disk
>>>> latency on ext3 or speed up it?
>>>>
>>>> With regards, Andrey
>>>>
>>>> ___
>>>
>>
>> _______________________________________________
>> Ext3-users mailing list
>> Ext3-users at redhat.com
>> https://www.redhat.com/mailman/listinfo/ext3-users
>
>