[Linux-cluster] GFS on Centos

Jeff Sturm jeff.sturm at eprize.com
Mon Dec 1 19:19:23 UTC 2008

1) What is the ratio of file reads to writes/creates for your Java

If this is very high (say 100:1 or more) GFS may work just fine.  In our
experience we have the most trouble with write contention, esp. on
shared directories.

2) How much time elapses (statistically speaking) between consective
reads of the same file on the same node?

If this is low enough you may be able to tune demote_secs such that
glocks can be reused for file accesses.  If you have too many files to
cache the inodes or glocks in memory, you may be better off tuning
demote_secs and glock_purge to keep the numbers small, and accept the
overhead that each file access is going to have to obtain a lock.

3) What does your directory layout look like?  How many files are you
placing in the same directory?

You'll probably want to avoid very large directories.  If e.g. all files
are kept in a single directory, you'll get write contention that would
effectively limit file creates to a single node at a time.

For directories with a high percentage of file creates, we've had better
luck establishing one directory per node, such that each node can read
files created by others, but only write to their own directory.  (And
session affinity to reduce the frequency of cross-node reads.)

Good luck.  The above advice is based on empirical evidence from our own
performance testing and other net wisdom, and the positive results we
obtained from strategies we employed both within our application and via
gfs tuning.  (The experts can tell you if I got any of this right or
wrong, since I lack an in-depth understanding of GFS/DLM internals.
GFS2 may behave very differently; we haven't had a chance to try it


-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Geoff Galitz
Sent: Monday, December 01, 2008 4:30 AM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] GFS on Centos

We are investigating deploying GFS across a small pool of servers:

Centos 5.1 x86_64
GigE Networking

The data will consist of approximately 400GB of small JPG files accessed
by an inhouse java app.  The entire cluster is 50 machines but only 7
will require access to this data repository.

GFS2 is not ready, yet... but my main question is, is it worth it to
wait for GFS2?  We are also looking at glusterfs.  

Our goal is:

- low administrative (sysadm) overhead
- good performance when accessing lots of small files (<100Mb)

Geoff Galitz
Blankenheim NRW, Deutschland

Linux-cluster mailing list
Linux-cluster at redhat.com

More information about the Linux-cluster mailing list