Poor Performance WhenNumber of Files > 1M

John Kalucki ext3 at kalucki.com
Wed Jun 11 05:18:46 UTC 2008


I am seeing similar problems to Sean McCauliff (2007-08-02) using ext3. 
I have a simple test that times file creations in a hashed directory 
structure. File creation time inexorably increases as the number of 
files in the filesystem increases. Altering variables can change the 
absolute performance, but I always see the steady performance degradation.

All of the following have no material effect on the steady drop in 
performance:

File length (1k, 4k, 16k)
Directory depth (5, 10, 15)
Average & Max files per directory (10, 20, 100)
Single or multi-threaded test
Moving test directory to a new name on same filesystem, restarting test.
Directory hash
RAID10 vs. simple disk
Linux version (RHE, Ubuntu)
System memory (32gig, 2gig)
Syncing after each close
Free space
Partition Age (old, perhaps fragmented, a bit dirty, new fs)

Performance seems to always map directly to the number of files in the 
ext3 filesystem.

After some initial run-fast time, perhaps once dirty pages begin to be 
written aggressively, for every 5,000 files added, my files created per 
second tends to drop by about one. So, depending on the variables, say 
with 6 RAID10 spindles, I might start at ~700 files/sec, quickly drop, 
then more slowly drop to ~300 files/sec at perhaps 1 million files, then 
see 299 files/sec for the next 5,000 creations, 298 files/sec, etc. etc.

As you'd expect, there isn't much CPU utilization, other than iowait, 
and some kjournald activity.

Is this a known limitation of ext3? Is expecting to write to 
O(10^6)-O(10^7) files in something approaching constant time expecting 
too much from a filesystem? What, exactly, am I stressing to cause this 
unbounded performance degradation?

Thanks,
-John Kalucki
ext3 at kalucki.com




----

    Hi all,

    I plan on having about 100M files totaling about 8.5TiBytes.   To see
    how ext3 would perform with large numbers of files I've written a test
    program which creates a configurable number of files into a 
configurable
    number of directories, reads from those files, lists them and then
    deletes them.  Even up to 1M files ext3 seems to perform well and scale
    linearly; the time to execute the program on 1M files is about double
    the time it takes it to execute on .5M files.  But past 1M files it
    seems to have n^2 scalability.  Test details appear below.

    Looking at the various options for ext3 nothing jumps out as the 
obvious
    one to use to improve performance.

    Any recommendations?

    Thanks!
    Sean






More information about the Ext3-users mailing list