[RFC] mke2fs with DIR_INDEX, RESIZE_INODE by default

Sat Mar 18 22:54:33 UTC 2006

On Fri, Mar 17, 2006 at 04:32:34PM -0800, Andrew Morton wrote:
> 
> btw, I have some directory readahead rework queued for 2.6.17
> (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc6/2.6.16-rc6-mm1/broken-out/ext3_readdir-use-generic-readahead.patch).
> 
> That's non-htree-only.  Is there any sane way of doing htree directory
> readahead?

Not really, since the htree readdir() accesses directory blocks in
hash tree order.  We could do a speculative read if we had kernel
infrastructure to determine whether or not we had spare disk
bandwidth, and only do the speculative readahead if we didn't have
more important blocks to read, but it's not going to buy us much in
terms of sequential readhead.

In general, I doubt directory readahead actually buys you *that* much,
because most workloads follow up the readdir with a stat() or an
open() call for each file returned, or something which requires
reading in the inode.  In addition, it's rare that the directory will
be contiguously allocated, which also cuts down on the value of the
readahead.

What we could do that would accelerate readdir() for htree would be to
build an entirely separate tree keyed by inode number, and let readdir
iterate on that structure.  That would return files sorted by inode
number, which would speed up the readdir/stat or readdir/open
workload.  That could be done as a COMPAT extension, at the cost of
doubling the amount of space required to store a directory, and
doubling the cost of adding or deleting an entry to that directory.
It's not something that you would want to do for all directories, in
all likelihood, but for certain application-specific directory
structures, it would definitely speed things up.

						- Ted