[Linux-cluster] Ext3/ext4 in a clustered environement

Alan Brown ajb2 at mssl.ucl.ac.uk
Mon Nov 7 11:43:29 UTC 2011


Nicolas Ross wrote:

> On some services, there are document directories that are huge, not that 
> much in size (about 35 gigs), but in number of files, around one 
> million. One service even has 3 data directories with that many files each.

You are utterly mad.

Apart from the human readability aspects if someone attempts a directory 
listing, you're putting a substantial load on your system each time you 
attempt to go into those directories, even with dentry/inode caching 
tweaked out to maximums.

Directory inode hashing helps, but not for filesystem abuse on this scale.

Be glad you're using ext3/4 and not GFS, the problems are several orders 
of magnitude worse there (it can take 10 minutes to list a directory 
with 10,000 files in it, let alone 1,000,000)

  > It works pretty well for now, but when it comes to data update (via
> rsync) and backup (also via rsync), the node doing the rsync crawls to a 
> stop, all 16 logical cores are used at 100% system, and it sometimes 
> freezes the file system for other services on other nodes.

That's not particularly surprising - and a fairly solid hint you should 
be revisiting the way you lay out your files.

If you go for a hierarchical layout you'll see several orders of 
magnitude speedup in access time without any real effort at all.

If you absolutely must put that many files in a directory, then use a 
filesystem able to cope with such activities. Ext3/4 aren't it.






More information about the Linux-cluster mailing list