Huge number of files in a directory

Thu Oct 8 10:24:54 UTC 2009

On Oct 8, 2009, at 1:30 AM, Cameron Simpson wrote:

> On 07Oct2009 16:57, Miner, Jonathan W (US SSA) <jonathan.w.miner at baesystems.com 
> > wrote:
> | The issue with 'ls' is that it wants to sort the output. You may  
> want to try using "-f", which says "do not sort"
>
> No, sorting is actually pretty cheap.
>
> The issue with ls and large directories is usually the fact that ls
> stat()s all the names. Plenty of other things need to stat()  
> everything
> too; backups of all kinds, for example. A stat() requires the OS to
> search the directory to map the stat()ed name to an inode, and  
> that's a
> linear operation on ext3 if you haven't turned on directory hashing.  
> In
> consequence, the 'ls' cost goes as the square of the number of  
> directory
> entries (n names, each asking for an stat() whose cost is O(n), so
> O(n^2) for the whole thing).
>
> The usual approach is to make a tree of subdirectories to mitigate the
> per-directory cost (keeping the size on n^2 low).

Just out of curiosity I did the following:

  1) Created directories with sets of empty files
  2) Created a script to time /bin/ls and /bin/ls -f
  3) Ran the script 10 times until the numbers stabilized
  4) Disabled dir_index, rebooted and tried again it all again

This was on a CentOS 5.3 with 512MB RAM running on a VMware ESXi  
hypervisor without any other VM running at the same time. I rebooted  
between tests with and without dir_index and waited for the load to  
settle.

Linux vm-centos.gtirloni 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT  
2009 i686 i686 i386 GNU/Linux

WITH DIR_INDEX:

Files / ls / ls -f

1000 files    / 0.00s  / 0.00s
2500 files    / 0.01s  / 0.00s
5000 files    / 0.03s  / 0.00s
10000 files   / 0.07s  / 0.01s
25000 files   / 0.21s  / 0.02s
50000 files   / 0.45s  / 0.05s
100000 files  / 0.99s  / 0.10s
250000 files  / 2.83s  / 0.25s
500000 files  / 6.04s  / 0.50s
1000000 files / 12.82s / 0.99s

WITHOUT DIR_INDEX:

Files / ls / ls -f

1000 files    / 0.00s  / 0.00s
2500 files    / 0.01s  / 0.00s
5000 files    / 0.03s  / 0.00s
10000 files   / 0.06s  / 0.00s
25000 files   / 0.18s  / 0.01s
50000 files   / 0.41s  / 0.03s
100000 files  / 0.88s  / 0.05s
250000 files  / 2.62s  / 0.14s
500000 files  / 5.55s  / 0.28s
1000000 files / 11.77s / 0.56s

I can't explain why it took longer to finish with dir_index.

-Giovanni