Many small files, best practise.
pg_ext3 at ext3.for.sabi.co.UK
Mon Sep 14 21:08:58 UTC 2009
[ ... ]
>> Also note that in order to write 10^9 files at 10^3/s rate
>> takes 10^6 seconds; roughly 10 days to populate the
>> filesystem (or at least that to restore it from backups).
> One thing that you can do when doing bulk loads of files (say,
> during a restore or migration), is to use a two phase
> write. First, write each of a batch of files (say 1000 files
> at a time), then go back and reopen/fsync/close them.
Why not just restore a database?
>>> One layout for directories that works well with this kind of
>>> thing is a time based one (say YEAR/MONTH/DAY/HOUR/MIN where
>>> MIN might be 0, 5, 10, ..., 55 for example).
>> As to the problem above and ths kind of solution, I reckon that
>> it is utterly absurd (and I could have used much stronger words).
> When you deal with systems that store millions of files,
Millions of files may work; but 1 billion is an utter absurdity.
A filesystem that can store reasonably 1 billion small files in
7TB is an unsolved research issue...
The obvious thing to do is to use a database, and there is no
way around this point.
If one genuinely needs to store a lot of files, why not split
them into many independent filesystems? A single large one is
only need to allow for hard linking or for having a single large
space pool, and in applications where the directory structure
above makes any kind of sense that neither is usually required.
> you pretty much always are going to use some kind of made up
> directory layout.
File systems are usually used for storing somewhat unstructured
information, not records that can be looked up with a simple
"YEAR/MONTH/DAY/HOUR/MIN" key, which seems very suitable for
something like a simpel DBMS.
There is even a tendency to move filesystems into databases, as
they scale a lot better.
And for cases where a filesystem still makes sense I would
rather use, instead of the inane manylevel directory structure
above, a file system design with proper tree indexes and perhaps
even one with the ability to store small files into inodes.
[ ... ]
> You can always try to write 1 million files in a single
Again, I'd rather avoid anything like that.
> but if you are writing your own application, using this kind
> of scheme is pretty trivial.
And an utter absurdity, for 1 billion files in 200k directories.
Both on its own merits and compared to the OBVIOUS alternative.
>> If anything, consider the obvious (obvious except to those
>> who want to use a filesystem as a small record database),
>> which is 'fsck' time, in particular given the structure of
>> 'ext3' (or 'ext4') metadata.
> fsck time has improved quite a lot recently with ext4 (and
> with xfs).
How many months do you think a 7TB filesystem with 1 billion
files would take to 'fsck' even with those improvements? Even
with the nice improvements?
[ ... ]
More information about the Ext3-users