[Linux-cluster] rhcs + gfs performance issues

Fri Oct 3 16:25:30 UTC 2008

It sounds like you have a SAN (fibre attached storage) that you are trying to turn into a NAS. That's justifiable if you have multiple mirrored SANs, but makes a mockery of HA if you only have one storage device since it leaves you with a single point of failure regardless of the number of front end nodes.

Do you have a separate gigabit interface/vlan just for cluster communication? RHCS doesn't use a lot of sustained bandwidth but performance is sensitive to latencies for DLM comms. If you only have 2 nodes, a direct crossover connection would be ideal.

How big is your data store? Are files large or small? Are they in few directories with lots of files (e.g. Maildir)?

Load averages will go up - that's normal, since there is added latency (round trip time) from locking across nodes. Unless your CPUs is 0% idle, the servers aren't running out of steam. So don't worry about it.

Also note that a clustered FS will _ALWAYS_ be slower than a non-clustered one, all things being equal. No exceptions. Also, if you are load sharing across the nodes, and you have Maildir-like file structures, it'll go slower than a purely fail-over setup, even on a clustered FS (like GFS), since there is no lock bouncing between head nodes. For extra performance, you can use a non-clustered FS as a failover resource, but be very careful with that since dual mounting a non-clustered FS will destroy the volume firtually instantly.

Provided that your data isn't fundamentally unsuitable for being handled by a clustered load sharing setup, you could try increasing lock trimming and increasing the number of resource groups. Search through the archives for details on that.

More suggestions when you provide more details on what your data is like.

Gordan

-----Original Message-----
From: "Doug Tucker" <tuckerd at engr.smu.edu>
To: linux-cluster at redhat.com
Sent: 03/10/08 16:54
Subject: [Linux-cluster] rhcs + gfs performance issues

We recently migrated from a 7 year old file server running on a single
proc dec alpha running Tru64 and utilizing Truclustering for HA, to a
Redhat cluster suite and gfs for HA on a dual duo core dell 2950 with
32gb ram, and have been having major performance issues.  Both have
fiber attached storage.  The old file server grossly outperforms the new
one!  The way we are utilizing it is for nfs file serving only to
multiple clients.  It doesn't take many users doing much on the clients,
to easily drive the load on the boxes into the 10+ range, where on the
old file server it never got above 2 or 3 to perform the same tasks.
The load and performance was much worse, but improved to where we are
now after setting all of the volumes to statfs_fast 1.  I also set nfs
threads to 256, which helped some, but I don't know what more to do, and
we are at the point of abandoning this platform if we cannot get it to
perform reasonably.  Please help!

Sincerely,

Doug Tucker
Network and Systems
Southern Methodist University

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster