[Linux-cluster] GFS 6.0u5 not freeing locks?

Tue Dec 13 21:38:46 UTC 2005

Kovacs, Corey J. wrote:

>It's been a while since I've worked on the following problem but here I am
>at it again.
>
>I have a three node system running RHEL3 update 5 (kernel 2.4.21-32) with 
>GFS-6.0.2.20-1. All three nodes are running as both lock managers and 
>filesystem clients. When sending thousands of files to the cluster 
>(on the order of 1/2 terrabyte of 50k files) target nodes will run 
>out of memory and refuse to fork. Interestingly enough this condition 
>does not cause the cluster to fence the node, rather it things everything 
>is "OK". The effect of course is that the fs is not accessable cuz the 
>cluster is waiting to hear back from the node in question.
>
>I set the high water mark to 10000 (I know that's low, but I wanted to see
>the effect)
>and the system seemed to be trying to free locks every ten seconds as  it
>should but
>simply could not keep up with the file xfer going in.
>
>By the time a node finally locks up there are over 300K of locks in use.
>There is
>only a small % diff between the locks reported and the inodes in the
>filesystem. If
>I interperet this correctly, it simply meand that for almost all the files I
>was able to xfer, there is an existing lock being used. Also, mem usage for
>lock_gulmd was at 85M+. 
>When we started logging things it was at 30M+ rising about 3-400k per min.
>
>  
>
By reading your description, we may have addressed this issue in RHEL 3 
Update 7 which is currently in QA stage.

Note that GFS caches file locks for optimization purpose. The lock is 
one-to-one corresponding to VFS inode. BTW, what is the size of your 
memory ? Since linux OS itself normally desn't release inode unless 
under memory pressure so GFS lock lingers around. In your case, when the 
memory pressure starts to show, the inode releasing from base kernel 
could be too late for you. Also you have combined lock servers with GFS 
nodes - this definitely makes the situation worse. The current solution 
is piggybacking the inode purge code in one of our kernel daemons which 
wakes up in a tunable interval to purge a tunable percentage of inode. 
There are a set of test RPMs available if you're willing to try out. Let 
us know.

-- Wendy