Very slow directory traversal

Wed Oct 10 15:59:20 UTC 2007

On Oct 06, 2007  00:10 -0700, Ross Boylan wrote:
> My last full backup of my Cyrus mail spool had 1,393,569 files and
> cconsumed about 4G after compression. It took over 13 hours.  Some
> investigation led to the following test:
>  time tar cf /dev/null /var/spool/cyrus/mail/r/user/ross/debian/user/

FYI - "tar cf /dev/null" actually skips reading any file data.  The
code special cases /dev/null and skips the read entirely.

> That took 15 minutes the first time it ran, and 32 seconds when run
> immediately thereafter.  There were 355,746 files. This is typical of
> what I've been seeing: initial run is slow; later runs are much faster.

I'd expect this is because on the initial run the on-disk inode ordering 
causes a lot of seeks, and later runs come straight from memory.  Probably
not a lot you can do directly, but e.g. pre-reading the inode table would
be a good start.

> I found some earlier posts on similar issues, although they mostly
> concerned apparently empty directories that took a long time.  Theodore
> Tso had a comment that seemed to indicate that hashing conflicts with
> Unix requirements.  I think the implication was that you could end up
> with linearized, or partly linearized searches under some scenarios.
> Since this is a mail spool, I think it gets lots of sync()'s.

There was an LD_PRELOAD library that Ted wrote that may also help:
http://marc.info/?l=mutt-dev&m=107226330912347&w=2

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.