[Linux-cluster] gfs_scand

Sun Jul 15 14:44:07 UTC 2007

Marc Grimme wrote:

>Hello,
>does anybody now what exactly is the task of gfs_scand. We see it with very 
>much CPU time loads of times (eg. system is up for 40h and gfs_scand has 4h 
>CPU-Time).
>And can you track down which scand is responsible for what filesystem?
>BTW: I'm talking about RHEL4U4.
>  
>

This is a complicated subject. So please bear with me and see whether 
the following description helps:

Gfs_scand scans GFS locks (glock) hash table to find:
1. if glock can be downgraded into less restricted state (say from 
shared state to unlock state) (and dirty data flushing is embedded in 
the glock transition code).
2. if glock is idle and in unlock state for too long, it will be reclaimed.

Whenever GFS needs a lock, it creates a glock and subsequently asks lock 
manager for a corresponding lock. In DLM case, there is one-to-one 
correspondence between glock and dlm lock.

Now if gfs_scand has used too much CPU time, it may mean the system has 
accumulated too many locks as described in:
http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4

Unfortunately the lock trimming patch added into RHEL 4.5 is too "mild" 
(i.e. not aggressive enough, see Red Hat bugzilla 245776). We'll try to 
correct the issue as soon as next errata is available. In short, if the 
daemon has hogged too much CPU time without any sign of slowing down 
whenever it wakes up, you can try to make it run less often by:
shell> gfs_tool settune <mount_point> scand_secs <x> 
          // the default x is 5 seconds

The side effect of longer scand_secs is that if you have large amount of 
file write and/or delete activities, the dirty data will stay in the 
buffer cache for longer time and lock count will up considerably.

-- Wendy