[Linux-cluster] rhcs + gfs performance issues

Fri Oct 3 17:29:33 UTC 2008

Doug Tucker wrote:

>> Do you have a separate gigabit interface/vlan just for cluster
 >> communication? RHCS doesn't use a lot of sustained bandwidth but
 >> performance is sensitive to latencies for DLM comms. If you only have
 >> 2 nodes, a direct crossover connection would be ideal.
> 
> Not sure how to accomplish that.  How do you get certain services of the
> cluster environment to talk over 1 interface, and other services (such
> as the shares) over another?  The only other interface I have configured
> is for the fence device (dell drac cards).

In your cluster.conf, make sure in the

<cluternode name="node1c"....

section is pointing at a private crossover IP of the node. Say you have 
2nd dedicated Gb interface for the clustering, assign it address, say 
10.0.0.1, and in the hosts file, have something like

10.0.0.1 node1c
10.0.0.2 node2c

That way each node in the cluster is referred to by it's cluster 
interface name, and thus the cluster communication will go over that 
dedicated interface.

The fail-over resources (typically client-side IPs) remain as they are 
on the client-side subnet.

>> How big is your data store? Are files large or small? Are they in
>> few directories with lots of files (e.g. Maildir)?
> 
> Very much mixed.  We have SAS and SATA  in the same SAN device, and
> carved out based on application performance need.  Some large volumes
> (7TB), some small (2GB).  Some large files (video) down to the mix of
> millions of 1k user files.

GFS copes OK with large files split across many separate directories. 
But if you are expecting to get fast random writes on files in the same 
directory, prepare to be disappointed. A write to a directory requires a 
directory lock, so concurrent writes to the same directory are going to 
have major performance issues. There isn't really any way to work around 
that, on any clustered FS.

As long as there is no directory write contention, it should be OK, though.

>> Load averages will go up - that's normal, since there is added latency
>> (round trip time) from locking across nodes. Unless your CPUs is 0% idle,
>> the servers aren't running out of steam. So don't worry about it.
> 
> Understood.  That was just the measure I used as comparison.  There is
> definite performance lag during these higher load averages.  What I was
> trying (and doing poorly) to communicate was that all we are doing here
> is serving files over nfs..we're not running apps on the cluster
> itself...difficult for me to understand why file serving would be so
> slow or ever drive load up on a box that high.

It sounds like you are seeing write contention. Make sure you mount 
everything with noatime,nodiratime,noquota, both from the GFS and from 
the NFS clients' side. Otherwise ever read will also require a write, 
and that'll kill any hope of getting decent performance out of the system.

> And, the old file
> server, did not have these performance issues doing the same tasks with
> less hardware, bandwith, etc.

I'm guessing the old server was standalone, rather than clustered?

>> Also note that a clustered FS will _ALWAYS_ be slower than a non-clustered
 >> one, all things being equal. No exceptions. Also, if you are load
 >> sharingacross the nodes, and you have Maildir-like file structures,
>> it'll go slower than a purely fail-over setup, even on a clustered
>> FS (like GFS), since there is no lock bouncing between head nodes.
>> For extra performance, you can use a non-clustered FS as a failover
 >> resource, but be very careful with that since dual mounting a
>> non-clustered FS will destroy the volume firtually instantly.
> 
> Agreed.  That's not the comaprison though.  Our old file server was
> running a clustered file system from Tru64 (AdvFS).  Our expectation was
> that a newer technology, plus a major upgrade in hardware, would result
> in better performance at least than what we had, it has not, it is far
> worse.

I see, so you had two servers in a load-sharing write-write 
configuration before, too?

>> Provided that your data isn't fundamentally unsuitable for being
>> handled by a clustered load sharing setup, you could try
>> increasing lock trimming and increasing the number of resource
>> groups. Search through the archives for details on that.
> 
> Can you point me in the direction of the archives?  I can't seem to find
> them?

Try here:
http://www.mail-archive.com/search?l=linux-cluster%40redhat.com

Look for gfs lock trimming and resource group related tuning.

>> More suggestions when you provide more details on what your data is like.
> 
> My apologies for the lack of detail, I'm a bit lost as to what to
> provide.  It's basic files, large and small.  User volumes, webserver
> volumes, postfix mail volumes, etc.

The important thing is to:
1) reduce the number of concurrent writes to the same directory to the 
maximum extent possible.
2) reduce the number of unnecessary writes (noatime,nodiratime)

All writes require locks to be bounced between the nodes, and this can 
add a significant overhead.

If you set the nodes up in a fail-over configuration, and server all the 
traffic from the primary node, you may see the performance improve due 
to locks not being bounced around all the time, they'll get set on the 
master node and stay there until the master node fails and it's floating 
IP gets migrated to the other node.

Gordan