Poor Performance WhenNumber of Files > 1M

Sean McCauliff smccauliff at mail.arc.nasa.gov
Fri Aug 3 18:15:37 UTC 2007


It seems like my code has a bug where it would create ndirs = nfiles / 
nFilesPerDir.  So it was making more directories.   I thought that with 
dir_index option directory entry look up would be more like O(1) so that 
  scale up should be completely linear.

Multi level directory schemes seem to degrade performance more.

Sean

Stephen Samuel wrote:
> Searching for directories (to ensure no duplicates, etc) is going to
> be order N^2.
> 
> Size of the directory is likely to be a limiting factor.
> 
> Try increasing to 10000 directories (in two layors of 100 each).  I'll
> bet you that the result will be a pretty good increase in speed
> (getting back to the speeds that you had with 1M directories).
> 
> 
> On 8/1/07, Sean McCauliff <smccauliff at mail.arc.nasa.gov> wrote:
>> Hi all,
>>
>> I plan on having about 100M files totaling about 8.5TiBytes.   To see
>> how ext3 would perform with large numbers of files I've written a test
>> program which creates a configurable number of files into a configurable
>> number of directories, reads from those files, lists them and then
>> deletes them.  Even up to 1M files ext3 seems to perform well and scale
>> linearly; the time to execute the program on 1M files is about double
>> the time it takes it to execute on .5M files.  But past 1M files it
>> seems to have n^2 scalability.  Test details appear below.
>>
>> Looking at the various options for ext3 nothing jumps out as the obvious
>> one to use to improve performance.
>>
>> Any recommendations?
>>




More information about the Ext3-users mailing list