Huge number of files in a directory
Giovanni P. Tirloni
tirloni at gmail.com
Thu Oct 8 10:24:54 UTC 2009
On Oct 8, 2009, at 1:30 AM, Cameron Simpson wrote:
> On 07Oct2009 16:57, Miner, Jonathan W (US SSA) <jonathan.w.miner at baesystems.com
> > wrote:
> | The issue with 'ls' is that it wants to sort the output. You may
> want to try using "-f", which says "do not sort"
>
> No, sorting is actually pretty cheap.
>
> The issue with ls and large directories is usually the fact that ls
> stat()s all the names. Plenty of other things need to stat()
> everything
> too; backups of all kinds, for example. A stat() requires the OS to
> search the directory to map the stat()ed name to an inode, and
> that's a
> linear operation on ext3 if you haven't turned on directory hashing.
> In
> consequence, the 'ls' cost goes as the square of the number of
> directory
> entries (n names, each asking for an stat() whose cost is O(n), so
> O(n^2) for the whole thing).
>
> The usual approach is to make a tree of subdirectories to mitigate the
> per-directory cost (keeping the size on n^2 low).
Just out of curiosity I did the following:
1) Created directories with sets of empty files
2) Created a script to time /bin/ls and /bin/ls -f
3) Ran the script 10 times until the numbers stabilized
4) Disabled dir_index, rebooted and tried again it all again
This was on a CentOS 5.3 with 512MB RAM running on a VMware ESXi
hypervisor without any other VM running at the same time. I rebooted
between tests with and without dir_index and waited for the load to
settle.
Linux vm-centos.gtirloni 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT
2009 i686 i686 i386 GNU/Linux
WITH DIR_INDEX:
Files / ls / ls -f
1000 files / 0.00s / 0.00s
2500 files / 0.01s / 0.00s
5000 files / 0.03s / 0.00s
10000 files / 0.07s / 0.01s
25000 files / 0.21s / 0.02s
50000 files / 0.45s / 0.05s
100000 files / 0.99s / 0.10s
250000 files / 2.83s / 0.25s
500000 files / 6.04s / 0.50s
1000000 files / 12.82s / 0.99s
WITHOUT DIR_INDEX:
Files / ls / ls -f
1000 files / 0.00s / 0.00s
2500 files / 0.01s / 0.00s
5000 files / 0.03s / 0.00s
10000 files / 0.06s / 0.00s
25000 files / 0.18s / 0.01s
50000 files / 0.41s / 0.03s
100000 files / 0.88s / 0.05s
250000 files / 2.62s / 0.14s
500000 files / 5.55s / 0.28s
1000000 files / 11.77s / 0.56s
I can't explain why it took longer to finish with dir_index.
-Giovanni
More information about the redhat-list
mailing list