[Linux-cluster] GFS tuning for combined batch / interactive use

Kevin Maguire kmaguire at eso.org
Fri Dec 17 19:06:58 UTC 2010


Hi

> You can get a glock dump via debugfs which may show up contention, looks 
> for type 2 glocks which have lots of lock requests queued but not 
> granted. The lock requests (holders) are tagged with the relevant 
> process.

Note I am currently using GFS, not GFS2. And before going further I ran 
the ping_pong test on my cluster and see only about 100 locks/second even 
on just 1 node.  So maybe I should look at plock_rate_limit parameter, 
though not sure if that is our core problem.

Anyways, As I write this my test cluster is being heavily used with batch 
jobs, and thus I have a window of opportunity to study it under load (but 
not change it).  I have debugfs mounted. There are 10 nodes in this test 
cluster. My filesystem is called mygfs, and was created via

mkfs.gfs -O -t dfoxen-cluster:mygfs -p lock_dlm -j 10 -r 2048 /dev/mapper/vggfs-lvgfs

This is what I have in debugfs:

# find /sys/kernel/debug/ -type f -exec wc -l {} \;
2309 /sys/kernel/debug/dlm/mygfs_locks
0 /sys/kernel/debug/dlm/mygfs_waiters
16258 /sys/kernel/debug/dlm/mygfs
2 /sys/kernel/debug/dlm/clvmd_locks
0 /sys/kernel/debug/dlm/clvmd_waiters
7 /sys/kernel/debug/dlm/clvmd

The lock dump file has content like:

# cat /sys/kernel/debug/dlm/mygfs_locks
id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
14f19eb 0 0 1038 0 0 0 2 3 -1 0 0 24 "       5         cec3e6d"
3da1a67 0 0 31861 0 0 0 2 3 -1 0 0 24 "       5         a0fafc2"
1120003 1 16f0019 3552 0 408 0 2 0 -1 0 1 24 "       3        2d8b9091"
af0002 1 10024 3552 0 408 0 2 0 -1 0 1 24 "       3        2053fbf8"
...

But I don't really see how to work our which type of lock is which from 
this file - sorry. Given $2 is the nodeid I can work our who has locks and 
that leads to a minor strangeness

node1 # awk 'NR>1{print $2}' /sys/kernel/debug/dlm/mygfs_locks | sort | uniq -c | sort -k +2n
    2142 0
    1619 2
    2001 3
    1586 4
    1566 5
    1624 6
    1610 7
    1733 8
    1592 9
    1612 10

These numbers are much bigger than the counts on the 9 other nodes, e.g.

node2 # awk 'NR>1{print $2}' /sys/kernel/debug/dlm/mygfs_locks | sort | uniq -c | sort -k +2n
     441 0
    1630 1
      75 3
       2 4
      10 5
      25 7
      15 8
      38 10

Is that normal ?

Using gfs_tool's lockdump I see

node1 # gfs_tool lockdump /newcache | egrep '^Glock' | sed 's?(\([0-9]*\).*)?\1?g' | sort | uniq -c
       3 Glock 1
     308 Glock 2
    1538 Glock 3
       2 Glock 4
     233 Glock 5
       2 Glock 8

Only type 2 and type 5 counts seem to change. Across the cluster there is 
one node with a lot more (10x more) Glock type 2 and Glock type 5 locks.

# gfs_tool counters /newcache

                                   locks 2313
                              locks held 781
                            freeze count 0
                           incore inodes 230
                        metadata buffers 1061
                         unlinked inodes 28
                               quota IDs 2
                      incore log buffers 28
                          log space used 1.46%
               meta header cache entries 1304
                      glock dependencies 185
                  glocks on reclaim list 0
                               log wraps 91
                    outstanding LM calls 0
                   outstanding BIO calls 0
                        fh2dentry misses 0
                        glocks reclaimed 2125924
                          glock nq calls 801437507
                          glock dq calls 796261692
                    glock prefetch calls 319835
                           lm_lock calls 6396763
                         lm_unlock calls 1031709
                            lm callbacks 7669741
                      address operations 1267096416
                       dentry operations 35815146
                       export operations 0
                         file operations 233333825
                        inode operations 61818196
                        super operations 148712313
                           vm operations 87114
                         block I/O reads 0
                        block I/O writes 0

Not sure if anyone can make anything from all these numbers ...

Thanks,
Kevin




More information about the Linux-cluster mailing list