[Linux-cluster] gfs deadlock situation

Mark Hlawatschek hlawatschek at atix.de
Tue Feb 13 12:37:18 UTC 2007


Hi,

we have the following deadlock situation:

2 node cluster consisting of node1 and node2. 
/usr/local is placed on a GFS filesystem mounted on both nodes. 
Lockmanager is dlm.
We are using RHEL4u4

a strace to ls -l /usr/local/swadmin/mnx/xml ends up in
lstat("/usr/local/swadmin/mnx/xml",                                                                

This happens on both cluster nodes.

All processes trying to access the directory /usr/local/swadmin/mnx/xml are 
in "Waiting for IO (D)" state. I.e. system load is at about 400 ;-)

Any ideas ?

a lockdump analysis with the decipher_lockstate_dump and parse_lockdump shows 
the following output (The whole file is too large for the mailing-list):

Entries:  101939
Glocks:  60112
PIDs:  751

4 chain:
lockdump.node1.dec Glock (inode[2], 1114343)
  gl_flags = lock[1]
  gl_count = 5
  gl_state = shared[3]
  req_gh = yes
  req_bh = yes
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 1
  ail_bufs = no
  Request
    owner = 5856
    gh_state = exclusive[1]
    gh_flags = try[0] local_excl[5] async[6]
    error = 0
    gh_iflags = promote[1]
  Waiter3
    owner = 5856
    gh_state = exclusive[1]
    gh_flags = try[0] local_excl[5] async[6]
    error = 0
    gh_iflags = promote[1]
  Inode: busy
lockdump.node2.dec Glock (inode[2], 1114343)
  gl_flags =
  gl_count = 2
  gl_state = unlocked[0]
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 0
  ail_bufs = no
  Inode:
    num = 1114343/1114343
    type = regular[1]
    i_count = 1
    i_flags =
    vnode = yes
lockdump.node1.dec Glock (inode[2], 627732)
  gl_flags = dirty[5]
  gl_count = 379
  gl_state = exclusive[1]
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 58
  ail_bufs = no
  Holder
    owner = 5856
    gh_state = exclusive[1]
    gh_flags = try[0] local_excl[5] async[6]
    error = 0
    gh_iflags = promote[1] holder[6] first[7]
  Waiter2
    owner = none[-1]
    gh_state = shared[3]
    gh_flags = try[0]
    error = 0
    gh_iflags = demote[2] alloced[4] dealloc[5]
  Waiter3
    owner = 32753
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  [...loads of Waiter3 entries...]
  Waiter3
    owner = 4566
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  Inode: busy
lockdump.node2.dec Glock (inode[2], 627732)
  gl_flags = lock[1]
  gl_count = 375
  gl_state = unlocked[0]
  req_gh = yes
  req_bh = yes
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 0
  ail_bufs = no
  Request
    owner = 20187
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  Waiter3
    owner = 20187
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  [...loads of Waiter3 entries...]
  Waiter3
    owner = 10460
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  Inode: busy
2 requests

-- 
Gruss / Regards,

Mark Hlawatschek
http://www.atix.de/               http://www.open-sharedroot.org/

** Visit us at CeBIT 2007 in Hannover/Germany **
** in Hall 5, Booth G48/2  (15.-21. of March) **

**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany




More information about the Linux-cluster mailing list