[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Many small files, best practise.

On 09/09/2009 09:00 AM, Pär Lanvin wrote:

RHEL 5.3
~1000.000.000 files (1-30k)
~7TB in total


I'm looking for a best practice when implementing this using EXT3 (or some other FS if it shouldn't do the job.).

On average the reads dominate (99%), writes are only used for updating and isn't a part of the service provided.
The data is divided into 200k directories with each some 5k files. This ratio (dir/files) can be altered to
optimize FS performance.

Any suggestions are greatly appreciated.



Hi Par,

This sounds a lot like the challenges I had in my recent past working on a similar storage system.

One key that you will find is to make sure that you minimize head movement while doing the writing. The best performance would be to have a few threads (say 4-8) write to the same subdirectory for a period of time of a few minutes (say 3-5) before moving on to a new directory.

If you are writing to a local S-ATA disk, ext3/4 can write a few thousand files/sec without doing any fsync() operations. With fsync(), you will drop down quite a lot.

One layout for directories that works well with this kind of thing is a time based one (say YEAR/MONTH/DAY/HOUR/MIN where MIN might be 0, 5, 10, ..., 55 for example).

When reading files in ext3 (and ext4) or doing other bulk operations like a large deletion, it is important to sort the files by inode (do the readdir, get say all of the 5k files in your subdir and then sort by inode before doing your bulk operation).

Good luck!


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]