[Linux-cluster] Ext3/ext4 in a clustered environement

Mon Nov 7 10:54:30 UTC 2011

Hi,

On Sat, 2011-11-05 at 14:17 -0400, bergman at merctech.com wrote:
> In the message dated: Fri, 04 Nov 2011 14:05:34 EDT,
> The pithy ruminations from "Nicolas Ross" on 
> <[Linux-cluster] Ext3/ext4 in a clustered environement> were:
> => Hi !
> => 
> 
> 	[SNIP!]
> 
> => 
> => On some services, there are document directories that are huge, not that 
> => much in size (about 35 gigs), but in number of files, around one million. 
> => One service even has 3 data directories with that many files each.
> 
> Ouch.
> 
> I've seen significant a performance drop with ext3 (and other) filesystems
> with 10s to 100s of thousands of files per directory. Make sure that the
> "directory hash" option is enabled for ext3. With ~1M files per directory, I'd
> do some performance tests comparing rsync under ext3, ext4, and gfs befor
> changing filesystems...while ext3/4 do perform better than gfs, the directory
> size may be such an overwhelming factor that the filesystem choice is
> irrelevent. 
> 
There are really two issues here, one is the performance of readdir and
listing the directory and the other is the performance of look ups of
individual inodes.

Turning on the hashing option for ext3 will improve the look up
performance, but make next to no different to the readdir performance.
GFS2 has had hashed directories, inherited from GFS, so on the look up
side of things, both should be fairly similar.

One issue though is that GFS2 will return the directory entries from
readdir in hash order. That is due to a restriction imposed by the
unfortunate combination of the Linux VFS readdir code and the GFS2
algorithm for expanding the directory hash table when it fills up.

Ideally, one would sort the returned entries into inode number order
before beginning the look ups of the individual inodes. I don't know if
rsync does this, or whether it is an option that can be turned on. It
should make a difference though. Also, being able to look up multiple
inodes in parallel should also dramatically improve the speed, if this
is possible. 

> => 
> => It works pretty well for now, but when it comes to data update (via rsync) 
> => and backup (also via rsync), the node doing the rsync crawls to a stop, all 
> => 16 logical cores are used at 100% system, and it sometimes freezes the file 
> => system for other services on other nodes.
> 
> Ouch!
> 
So the question is what is using all this cpu time? Is this being used
by rsync, or by some of the gfs2/dlm system daemons or even by some
other threads?

> => 
> => We've changed recently the way the rsync is done, we just to a rsync -nv to 
> => see what files would be transfered and transfer thos files manually. But 
> => it's still too much sometimes for the gfs.
> 
> Is this a GFS issue strictly, or an issue with rsync. Have you set up a
> similar environment under ext3/4 to test jus the rsync part? Rsync is
> known for being a memory & resource hog, particularly at the initial
> stage of  building the filesystem tree.
> 
> I would strongly recommend benchmarking rsync on ext3/4 before making the
> switch.
> 
> One option would be to do several 'rsync' operations (serially, not in
> parallel!), each operating on a subset of the filesystem, while continuing
> to use gfs.
> 
> 
I agree that we don't have enough information yet to make a judgement on
where the problem lies. It may well be something that can be resolved by
making some alterations in the way that rsync is done,

Steve.