ext3 with maildir++ = huge disk latency and high load

Andreas Dilger adilger at dilger.ca
Sun Sep 25 06:16:12 UTC 2011

On Sat, Sep 24, 2011 at 03:04:47PM -0400, Ted Ts'o wrote:
> I also have a LD_PRELOAD hack that can be used to demonstrate why
> putting this is a good idea.  You can google for spd_readdir and find
> it.  I'll also put the latest version of it in the contrib directory
> in e2fsprogs for the next release.

What we've started doing in Lustre (which has to deal with network
latency, but the same problem in terms of htree vs. inode ordering)
is to detect if the application is doing readdir+stat on the dirents
in readdir order, and then fork a thread to statahead the entries
in the kernel.

It would be possible to do something like this in the ext4 readdir
code to do dirent readahead, sort, and then prefetch the inodes
in order (partially or completely, depending on the directory size),
but as yet we aren't working on anything at the ext4 level.

There was a patch to do something similar to this for btrfs as well,
with the DCACHE_NEED_LOOKUP flag.  That avoids a lot of the complexity
between instantiating dcache entries from readdir without yet having
read the inode from disk.

The other proposal I've made in the past is to try and allocate inodes
from the inode table in roughly hash order, so that when it comes time
to do readdir+stat that the dirents and inodes are already partially in
the same order.  That breaks down in case of renames, but works well
for normal usage.

Cheers, Andreas

More information about the Ext3-users mailing list