[Linux-cluster] GFS 2 node hang in rm test

Mon Dec 6 19:45:21 UTC 2004

On Fri, Dec 03, 2004 at 03:08:00PM -0800, Daniel McNeil wrote:
> Is there any documentation on what these fields are?

Ben Cahill is doing a good job adding comments.  So, you might want
to look in cluster/gfs-kernels/src/gfs/incore.h and 
cluster/gfs-kernels/src/gfs/glock.h.

> What is the difference between Waiter2 and Waiter3?

The waiter2 list is for requests to drop the glock.  The waiter3 list
is for processes waiting to acquire the glock.

> Shouldn't the cl030 know (bast) that the updatedb on cl031
> is trying to go shared->exclusive?

It does, that's why there is something on the waiter2 list.

> What does the gfs_tool/parse_lockdump script do?

parse_lockdump takes the output of decipher_lockstate_dump from multiple
machines and looks for groups of locks and processes that might be
deadlocking together.  The interesting one it found for your situation is:

2 chain:
30.ld Glock (inode[2], 39860) 
  gl_flags =  
  gl_count = 6 
  gl_state = shared[3] 
  lvb_count = 0 
  object = yes 
  aspace = 2 
  reclaim = no 
  Holder 
    owner = 28723 
    gh_state = shared[3] 
    gh_flags = atime[9]  
    error = 0 
    gh_iflags = promote[1] holder[6] first[7]  
  Waiter2 
    owner = none[-1] 
    gh_state = unlocked[0] 
    gh_flags = try[0]  
    error = 0 
    gh_iflags = demote[2] alloced[4] dealloc[5]  
  Waiter3 
    owner = 29693 
    gh_state = shared[3] 
    gh_flags = any[3]  
    error = 0 
    gh_iflags = promote[1]  
  Inode: busy 
31.ld Glock (inode[2], 39860) 
  gl_flags = lock[1]  
  gl_count = 5 
  gl_state = shared[3] 
  lvb_count = 0 
  object = yes 
  aspace = 1 
  reclaim = no 
  Request 
    owner = 29537 
    gh_state = exclusive[1] 
    gh_flags = local_excl[5] atime[9]  
    error = 0 
    gh_iflags = promote[1]  
  Waiter3 
    owner = 29537 
    gh_state = exclusive[1] 
    gh_flags = local_excl[5] atime[9]  
    error = 0 
    gh_iflags = promote[1]  
  Inode: busy 
1 requests

It shows machine 31 is asking for the lock exclusively to perform an
atime update.  Machine 30 has received the callback asking for the
lock to be released, but it can't because process 28723 is actively
using the lock.

Looking at the backtrace for process 28723, we see:

rm            D 00000008     0 28723  28711                     (NOTLB)
caa0ab60 00000082 caa0ab50 00000008 00000001 00000000 00000008 00000000 
       c1addb80 00000246 00002f57 caa0ab6c c0105d5e ddb18f90 00000000 caa0ab48 
       c170df60 00000000 00003c53 44ddcfe7 00002f57 ea82cd70 ea82ced0 f8b80260 
Call Trace:
 [<c03d409c>] rwsem_down_write_failed+0x9c/0x18e
 [<f8b7a259>] .text.lock.locking+0x72/0x1c9 [dlm]
 [<f8b78879>] dlm_lock+0x319/0x3b0 [dlm]
 [<f8af3eda>] do_dlm_lock+0x11a/0x200 [lock_dlm]
 [<f8af407c>] lm_dlm_lock+0x6c/0x80 [lock_dlm]
 [<f8b3b7b0>] gfs_glock_xmote_th+0xc0/0x200 [gfs]
 [<f8b3c749>] gfs_glock_prefetch+0xf9/0x1e0 [gfs]
 [<f8b3cfba>] gfs_glock_prefetch_num+0x6a/0xc0 [gfs]
 [<f8b5268b>] filldir_reg_func+0xbb/0x180 [gfs]
 [<f8b323a4>] do_filldir_main+0x1b4/0x240 [gfs]
 [<f8b324f8>] do_filldir_single+0xc8/0x110 [gfs]
 [<f8b3359f>] dir_e_read+0x31f/0x360 [gfs]
 [<f8b33f5c>] gfs_dir_read+0x3c/0x90 [gfs]
 [<f8b527f5>] readdir_reg+0xa5/0xc0 [gfs]
 [<f8b52b7b>] gfs_readdir+0x6b/0x90 [gfs]
 [<c01723ab>] vfs_readdir+0x9b/0xc0
 [<c0172840>] sys_getdents64+0x70/0xaf
 [<c010537d>] sysenter_past_esp+0x52/0x71

So, inode 39860 is a directory.  That process acquired the lock on the
directory to perform a readdir.  While it does the readdir, it sends out
prefetch requests for the locks in the directory.  Those requests
are asynchronous tries, so they shouldn't ever block for long.
The prefetch is for this glock:

1 chain:
30.ld Glock (inode[2], 39883) 
  gl_flags = lock[1] prefetch[3] 
  gl_count = 2 
  gl_state = unlocked[0] 
  lvb_count = 0 
  object = no 
  aspace = 0 
  reclaim = no 
0 requests

So, the question is:  Why is the DLM stuck on that request?

-- 
Ken Preslan <kpreslan at redhat.com>