Many small files, best practise.

Mon Sep 14 09:40:18 UTC 2009

>> RHEL 5.3
>> ~1000.000.000 files (1-30k)
>> ~7TB in total
>> //

>> I'm looking for a best practice when implementing this using
>> EXT3 (or some other FS if it shouldn't do the job.). 

"best practice" would be a rather radical solution.

>> On average the reads dominate (99%), writes are only used for
>> updating and isn't a part of the service provided.  The data
>> is divided into 200k directories with each some 5k files.
>> This ratio (dir/files) can be altered to optimize FS
>> performance.

> If you are writing to a local S-ATA disk, ext3/4 can write a
> few thousand files/sec without doing any fsync() operations.
> With fsync(), you will drop down quite a lot.

Unfortunately using 'fsync' is a good idea for production
systems.

Also note that in order to write 10^9 files at 10^3/s rate takes
10^6 seconds; roughly 10 days to populate the filesystem (or at
least that to restore it from backups).

> One layout for directories that works well with this kind of
> thing is a time based one (say YEAR/MONTH/DAY/HOUR/MIN where
> MIN might be 0, 5, 10, ..., 55 for example).

As to the problem above and ths kind of solution, I reckon that
it is utterly absurd (and I could have used much stronger words).

  BTW, the sort of people who consider seriously such utter
  absurdities try to do a thorough job, and I don't want to
  know how the underlying storage system is structured :-).

If anything, consider the obvious (obvious except to those who
want to use a filesystem as a small record database), which is
'fsck' time, in particular given the structure of 'ext3' (or
'ext4') metadata.

So: just don't use a filesystem as a database, spare us the
horror; use a database, even a simple one, which is not utterly
absurd.

Compare these two:

  http://lists.gllug.org.uk/pipermail/gllug/2005-October/055445.html
  http://lists.gllug.org.uk/pipermail/gllug/2005-October/055488.html

Anyhow I do see a lot of inane questions and "solutions" like
the above in various lists (usually the XFS one, which attracts
a lot of utter absurdities).

> When reading files in ext3 (and ext4) or doing other bulk
> operations like a large deletion, it is important to sort the
> files by inode (do the readdir, get say all of the 5k files in
> your subdir and then sort by inode before doing your bulk
> operation).

Good idea, but it is best to avoid the cases where this matters.