[Linux-cluster] Ext3/ext4 in a clustered environement

Sat Nov 5 18:17:09 UTC 2011

In the message dated: Fri, 04 Nov 2011 14:05:34 EDT,
The pithy ruminations from "Nicolas Ross" on 
<[Linux-cluster] Ext3/ext4 in a clustered environement> were:
=> Hi !
=> 

	[SNIP!]

=> 
=> On some services, there are document directories that are huge, not that 
=> much in size (about 35 gigs), but in number of files, around one million. 
=> One service even has 3 data directories with that many files each.

Ouch.

I've seen significant a performance drop with ext3 (and other) filesystems
with 10s to 100s of thousands of files per directory. Make sure that the
"directory hash" option is enabled for ext3. With ~1M files per directory, I'd
do some performance tests comparing rsync under ext3, ext4, and gfs befor
changing filesystems...while ext3/4 do perform better than gfs, the directory
size may be such an overwhelming factor that the filesystem choice is
irrelevent. 

=> 
=> It works pretty well for now, but when it comes to data update (via rsync) 
=> and backup (also via rsync), the node doing the rsync crawls to a stop, all 
=> 16 logical cores are used at 100% system, and it sometimes freezes the file 
=> system for other services on other nodes.

Ouch!

=> 
=> We've changed recently the way the rsync is done, we just to a rsync -nv to 
=> see what files would be transfered and transfer thos files manually. But 
=> it's still too much sometimes for the gfs.

Is this a GFS issue strictly, or an issue with rsync. Have you set up a
similar environment under ext3/4 to test jus the rsync part? Rsync is
known for being a memory & resource hog, particularly at the initial
stage of  building the filesystem tree.

I would strongly recommend benchmarking rsync on ext3/4 before making the
switch.

One option would be to do several 'rsync' operations (serially, not in
parallel!), each operating on a subset of the filesystem, while continuing
to use gfs.

	[SNIP!]

=> 
=> <fs device="/dev/VGx/documentsA" force_unmount="1" fstype="ext4" 
=> mountpoint="/GFSVolume1/Service1/documentsA" name="documentsA" 
=> options="noatime,quota=off"/>
=> 
=> So, first, is this doable ?

Yes.

We have been doing something very similar for the past ~2 years, except
not mounting the ext3/4 partition under a GFS mountpoint.

=> 
=> Second, is this risky ? In the sens of that with force_unmont true, I assume 
=> that no other node would mount that filesystem before it is unmounted on the 
=> stopping service. I know that for some reason umount could hang, but it's 
=> not likely since this data is mostly read-only. In that case the service 

We've experienced numerous cases where the filesystem hangs after a
service migration due a node (or service) failover. These hangs all
seem to be related to quota or NFS issues, so this may not be an issue
in your environment.

=> would be failed and need to be manually restarted. What would be the 
=> consequence if the filesystem happens to be mounted on 2 nodes ?

Most likely, filesystem corruption.

=> 
=> One could add self_fence="1" to the fs line, so that even if it fails, it 
=> will self-fence the node to force the umount. But I'm not there yet.

We don't do that...and haven't felt the need to.

=> 
=> Third, I've been told that it's not recommended to mount a file system like 
=> this "on top" of another clustered fs. Why is that ? I suppose I'll have to 

First of all, that's introducing another dependency. If you mount the ext3/4
partition under a local directory (ie., /export), then you could have nodes
that provide your rsync data service, without requiring GFS.

=> mount under /mnt/something and symlink to that.
=> 
=> Thanks for any insights. 
=> 

Mark