[Linux-cluster] GFS (1 & partially 2) performance problems

Mon Jun 14 12:00:35 UTC 2010

Hello!

I am currently building a Cluster sitting on CentOS 5 for GFS usage.

At the moment, the storage subsystem consists of an HP MSA2312
Fibrechannel SAN linked to an FC 8gbit switch. Three client machines
are connected to that switch over 8gbit FC. The disks themselves are
12 * 15.000rpm SAS configured in RAID-5 with two hotspares.

Now, the whole storage shall be shared (single filesystem), here GFS
comes in.

The Cluster is only 3 nodes large at the moment, more nodes will be
added later on. I am currently testing GFS1 and GFS2 for performance.
Lock Management is done over single 1Gbit Ethernet Links (1 per
machine).

Thing is, with GFS1 I get far better performance than with the newer
GFS2 across the board, with a few tunable parameters set, for writes
GFS1 is roughly twice as fast.

But, concurrent reads are totally abysmal. The total write performance
(all nodes combined) sits around 280-330Mbyte/sec, whereas the
READ performance is as low as 30-40Mbyte/sec when doing concurrent
reads. Surprisingly, single-node read is somewhat ok at 180Mbyte/sec,
but as soon as several nodes are reading from GFS (version 1 at the
moment) at the same time,  things turn ugly.

This is strange, because for writes, global performance across the
cluster increases slightly when adding more nodes. But for reads, the
oppsite seems to be true.

For read and write tests, separate testfiles were created and read for
each node, with each testfile sitting in its own subdirectory, so no
node would access another nodes file.

GFS1 created with the following mkfs.gfs parameters:
"-b 4096 -J 128 -j 16 -r 2048 -p lock_dlm"
(4kB blocksite, 16 * 128MB journals, 2GB resource groups,
Distributed LockManager)

Mount Options set: "noatime,nodiratime,noquota"

Tunables set: "glock_purge 50, statfs_slots 128, statfs_fast 1, 
demote_secs 20"

Also, in /etc/cluster/cluster.conf, I added this:
<dlm plock_ownership="1" plock_rate_limit="0"/>
<gfs_controld plock_rate_limit="0"/>

Any ideas on how to figure out what's going wrong, and how to
tune GFS1 for better concurrent read performance, or tune GFS2
in general to be competitive/better than GFS1?

I'm dreaming about 300MB/sec read, 300MB/sec write sequentially
and somewhat good reaction times while under heavy sequential
and/or random load. But for now, I just wanna get the seq reading
to work acceptably fast.

Thanks a lot for your help!

-- 
Michael Lackner
Chair of Information Technology, University of Leoben
IT Administration
michael.lackner at mu-leoben.at | +43 (0)3842/402-1505