The maximum number of files under a folder

John Nelson articpenguin3800 at gmail.com
Wed Mar 19 12:16:15 UTC 2008


What does what does the h stand for in h-tree? Like the b in btree is 
binary Tree



Stephen Samuel wrote:
> The OS will have to search the directory to see if the file already 
> exists before creating it.
>
> Well, if you hash it such that it splits up something like:
> jobid(upper part)/jobid(lower- part)[/-]timestamp-process,
>  you'll find that your access times will be must faster (especially if 
> you don't use H-Trees).  This also applies if  you're just creating a 
> file, because you'll have to search the entire directory to see if 
> that filename exists
>
> With regular directories, searching through them to see if a file 
> already exist increases linearly with the number of entries.  If you 
> hash on 3 levels with 8-bits per level, you'll have to open 2 or 3 
> extra inodes, but you'll cut your directory search times down by a 
> factor of 20000-1.  You'll also skip having to deal with any sort of 
> directory-size limit. (=2^24/256/3)
>
> I did something similar on a Solaris box which had 200000 emails in 
> the /var/spool/mqueue directory. That many messages was slowing the 
> system to a crawl.  I hashed it into 100 directories with 2000  
> entries each,   it sped things up *enormously.*
>
> On Tue, Mar 18, 2008 at 3:56 PM, Andreas Dilger <adilger at sun.com 
> <mailto:adilger at sun.com>> wrote:
>
>     On Mar 17, 2008  09:32 -0400, Theodore Ts'o wrote:
>     > On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
>     > > Theodore Tso,
>     > >
>     > >     In 64bit system, directory size can not be bigger than 2GB?
>     >
>     > No, because the high 32-bits for i_size are overloaded to store the
>     > directory creation acl.
>
>     I think we should change the code (kernel and e2fsprogs) to allow
>     i_size_high for directories also.
>
>     > In practice, you really don't want to have a directory that huge
>     > anyway.  Iterating through it all with readdir() gets horribly slow,
>     > and applications that try do anything with really huge directories
>     > would be well advised to use a database, because they will get
>     *much*
>     > better performance that way....
>
>     Actually, for many HPC applications they never do readdir at all.
>     The job creates 1 file/process and always uses a predefined filename
>     like {job}-{timestamp}-{process} that it will directly look up.
>
>     Cheers, Andreas
>
>
>
>
> -- 
> Stephen Samuel http://www.bcgreen.com
> 778-861-7641 




More information about the Ext3-users mailing list