[Fedora] Re: Alternatives to du

Tue Apr 10 18:53:24 UTC 2007

Alan Cox wrote:
>> Unfortunately, in this case, the directory structure is what it is.  I 
>> can't change it without a massive amount of pain.
> 
> Then you won't get sane performance. The ext3 directory hash does what it
> can to improve performance in this case but you'll probably find most of
> your extra overhead (compared to users with sane directory structures) is
> simply down to filename lookup and directory scanning overhead.

I can see why a filename lookup would be slow in a large directory and 
why keeping it locked between the initial check for a name's existence 
the write to create a new name becomes problematic, but doesn't du just 
do a linear walk anyway?  I don't see why making the tree deeper and 
less wide would make a lot of difference there.  What might make a 
difference would be sorting the list and doing the stat()s in inode 
order.   A variation of this question just came up on the backuppc list. 
  It does hash things into a directory tree, but since it keeps an 
online backup containing a history of many other machines you end up 
with millions of files with all duplicates hardlinked.  The issue there 
is that because most of the directory entries are hardlinks to existing 
files the inodes are wildly out-of order and you end up waiting for a 
lot of seeks if you try to stat them in directory scan order.  I've been 
using reiserfs for a long time for my backuppc partition but would be 
interested to know if any of the other filesystems might have any 
advantage in handling huge numbers of directory entries (overall, not in 
a single directory).

-- 
   Les Mikesell
    lesmikesell at gmail.com