[Linux-cluster] GFS2 - monitoring the rate of Posix lock operations

Fri Mar 26 13:25:47 UTC 2010

Hi,

On Fri, 2010-03-26 at 02:31 +0000, Jankowski, Chris wrote:
> Hi,
> 
> I understand that GFS2 by default has a limit on the rate of POSIX locks to 100 per second.
> This limit can be removed by the following entry in /etc/cluster/cluster.conf:
> 
> <gfs_controld plock_rate_limit="0" plock_ownership="1"/>
> 
> 
> Question 1:
> ------------
> How can I monitor the rate of POSIX lock operations?
> 
> The reason I am asking this question is that I am trying to maximise application throughput in a cluster. This is a database type application running on one node with the other node being an idle standby. Under large generated workload I see the system in a state in which it still has available unused CPU, memory, IO rate and bandwidth and network bandwidth capacity, but will not go any faster. I am suspecting that GFS POSIX lock processing is the bottleneck, but at the moment have no data to prove it, and no information on how to tune it to remove this bottleneck.
> 
There is a ping_pong lock test program which can be used to get an idea
of the locking rate, but so far as I know there is no way to monitor
this in a running system.

> 
> Question 2:
> -----------
> What may be limiting the throughput of GFS2 with plock_rate_limit set to 0 and in the absence of global physical shortage of resouces?  Could this be the gfs_controld process saturating one CPU core?  I indeed see gfs_lockd using 90%+ of one CPU core in top(1).
> 
Are you sure that the workload isn't causing too many cache
invalidations due to sharing files/directories between nodes? This is
the most usual cause of poor performance.

> 
> Question 3:
> -----------
> What else can I tune to get higher maximum throughput from GFS2 used in this asymmetrical configuration?  Potentially, I need much more throughput, as my real production cluster is to support 1,000+ transactional users.
> 
Have you used the noatime mount option? If you can use it, its highly
recommended. Also turn off selinux if that is running on the GFS2
filesystem.

> 
> Question 4:
> -----------
> Is there a leaner, more efficient way of using GFS2 for such asymmetrical operation when all accesses are from only one node and the other acts as a hot standby with no fsck needed on failover of the service to the other node?  I am in a position to move the mounting of the GFS2 fiesystem into the service script if that would be of help.
> 
Potentially there might be. I don't know enough about the application to
say, but it depends on how the workload can be arranged,

Steve.

> Your comments and ideas will be much appreciated.
> 
> Regards,
> 
> Chris
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster