[Linux-cluster] rhcs + gfs performance issues

Sat Oct 4 17:32:58 UTC 2008

Doug Tucker wrote:
>> In your cluster.conf, make sure in the
>>
>> <cluternode name="node1c"....
>>
>> section is pointing at a private crossover IP of the node. Say you have 
>> 2nd dedicated Gb interface for the clustering, assign it address, say 
>> 10.0.0.1, and in the hosts file, have something like
>>
>> 10.0.0.1 node1c
>> 10.0.0.2 node2c
>>
>> That way each node in the cluster is referred to by it's cluster 
>> interface name, and thus the cluster communication will go over that 
>> dedicated interface.
>>
> I'm not sure I understand this correctly, please bear with me, are you
> saying the communication runs over the fenced interface?

No, over a dedicated, separate interface.

> Or that the
> node name should reference a seperate nic that is private, and the
> exported virtual ip to the clients is done over the public interface?

That's the one.

> I'm confused, I thought that definition had to be the same as the
> hostname of the box?

No. The floating IPs will get assigned to whatever interface has the IP 
on that subnet. The cluster/DLM comms interface is inferred by the node 
name.

> Here is what is in my conf file for reference:
> 
>  <clusternode name="engrfs1.seas.smu.edu" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device modulename=""
> name="engrfs1drac"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="engrfs2.seas.smu.edu" nodeid="2"
> votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device modulename=""
> name="engrfs2drac"/>
>                                 </method>
>                         </fence>
> 
> Where as engrfs1 and 2 are the actual hostnames of the boxes.

Add another NIC in, give it a private IP/subnet, and put it in the hosts 
file on both nodes as something like engrfs1-cluster.seas.smu.edu, and 
put that in the clusternode name entry.

>> The fail-over resources (typically client-side IPs) remain as they are 
>> on the client-side subnet.
> 
>> It sounds like you are seeing write contention. Make sure you mount 
>> everything with noatime,nodiratime,noquota, both from the GFS and from 
>> the NFS clients' side. Otherwise ever read will also require a write, 
>> and that'll kill any hope of getting decent performance out of the system.
 >
> Already mounted noatime, will add nodiratime.  Can't do noquota, we
> implement quotas for ever users here (5000 or so), and did so on the old
> file server.
> 
>> I'm guessing the old server was standalone, rather than clustered?
 >
> No, clustered, as I assume you realized below, just making sure it's
> clear.

OK, noted.

>> I see, so you had two servers in a load-sharing write-write 
>> configuration before, too?
 >
> Certainly were capable of such.  However here, as we did there, we set
> it up in more of a failover mode.  We export a virtual ip attached to
> the nfs export, and all clients mount the vip, so whichever machine has
> the vip at a given time is "master" and gets all the traffic.  The only
> exception to this is the backups that run at night, we do on the
> "secondary" machine directly, rather than using the vip.  And the
> secondary is only there in the event of a failure to node1, when node1
> comes back online, it is set up to fail back to node1.

OK, that should be fine, although you may find there's less of a 
performance hit if you do the backup from the master node, too, as 
that'll already have the locks on all the files.

>> If you set the nodes up in a fail-over configuration, and server all the 
>> traffic from the primary node, you may see the performance improve due 
>> to locks not being bounced around all the time, they'll get set on the 
>> master node and stay there until the master node fails and it's floating 
>> IP gets migrated to the other node.
 >
> As explained above, exactly how it is set up.  Old file server the same
> way.  We're basically completely scratching our heads in disbelief here
> to a large degree.  No if/ands/buts about it, hardware wise, we have
> 500% more box than we used to have.  Configuration architecture is
> virtually identical.  Which leaves us with the software, which leaves us
> with only 2 conclusions we can come up with:
> 
> 1)  Tru64 and TruCluster with Advfs from 7 years ago is simply that much
> more robust and mature than RHES4 and CS/GFS and therefore tremendously
> outperforms it...or

RHEL4 is quite old. It's been a while since I used it for clustering. 
RHEL5 has yielded considerably better performance in my experience.

> 2)  We have this badly configured.

There isn't all that much to tune on RHEL4 cluster-wise, most of the 
tweakability has been added more recently than I've last used it. I'd 
say RHEL5 is certainly worth trying. The problem you are having may just 
go away.

Gordan