[Linux-cluster] gfs deadlock situation

Mark Hlawatschek hlawatschek at atix.de
Wed Feb 14 08:17:30 UTC 2007


On Tuesday 13 February 2007 20:56, Wendy Cheng wrote:
> Wendy Cheng wrote:
> >
> > So it is removing a file. It has obtained the directory lock and is
> > waiting for the file lock. Look to me DLM (LM_CB_ASYNC) callback never
> > occurs. Do you have abnormal messages in your /var/log/messages file ?
> > Dave, how to dump the locks from DLM side to see how DLM is thinking ?
>
> shell> cman_tool services    /* find your lock space name */
> shell> echo "lock-space-name-found-above" > /proc/cluster/dlm_locks
> shell> cat /proc/cluster/dlm_locks
>
> Then try to find the lock (2, hex of (1114343)) - cut and paste the
> contents of that file here.

syslog seems to be ok.
note, that the process 5856 is running on node1

Here's the dlm output:

node1:
Resource 0000010001218088 (parent 0000000000000000). Name (len=24) "       2          
1100e7"
Local Copy, Master is node 2
Granted Queue
Conversion Queue
Waiting Queue
5eb00178 PR (EX) Master:     3eeb0117  LQ: 0,0x5
[...]
Resource 00000100f56f0618 (parent 0000000000000000). Name (len=24) "       5          
1100e7"
Local Copy, Master is node 2
Granted Queue
5bc20257 PR Master:     3d9703e0
Conversion Queue
Waiting Queue

node2:
Resource 00000107e462c8c8 (parent 0000000000000000). Name (len=24) "       2          
1100e7"
Master Copy
Granted Queue
3eeb0117 PR Remote:   1 5eb00178
Conversion Queue
Waiting Queue
[...]
Resource 000001079f7e81d8 (parent 0000000000000000). Name (len=24) "       5
      1100e7"
Master Copy
Granted Queue
3d9703e0 PR Remote:   1 5bc20257
3e500091 PR
Conversion Queue
Waiting Queue

Thanks for your help,

Mark

> >>>> we have the following deadlock situation:
> >>>>
> >>>> 2 node cluster consisting of node1 and node2.
> >>>> /usr/local is placed on a GFS filesystem mounted on both nodes.
> >>>> Lockmanager is dlm.
> >>>> We are using RHEL4u4
> >>>>
> >>>> a strace to ls -l /usr/local/swadmin/mnx/xml ends up in
> >>>> lstat("/usr/local/swadmin/mnx/xml",
> >>>>
> >>>> This happens on both cluster nodes.
> >>>>
> >>>> All processes trying to access the directory
> >>>> /usr/local/swadmin/mnx/xml
> >>>> are in "Waiting for IO (D)" state. I.e. system load is at about 400
> >>>> ;-)
> >>>>
> >>>> Any ideas ?
> >>>
> >>> Quickly browsing this, look to me that process with pid=5856 got stuck.
> >>> That process had the file or directory (ino number 627732 - probably
> >>> /usr/local/swadmin/mnx/xml) exclusive lock so everyone was waiting for
> >>> it. The faulty process was apparently in the middle of obtaining
> >>> another
> >>> exclusive lock (and almost got it). We need to know where pid=5856 was
> >>> stuck at that time. If this occurs again, could you use "crash" to back
> >>> trace that process and show us the output ?
> >>
> >> Here's the crash output:
> >>
> >> crash> bt 5856
> >> PID: 5856   TASK: 10bd26427f0       CPU: 0   COMMAND: "java"
> >>  #0 [10bd20cfbc8] schedule at ffffffff8030a1d1
> >>  #1 [10bd20cfca0] wait_for_completion at ffffffff8030a415
> >>  #2 [10bd20cfd20] glock_wait_internal at ffffffffa018574e
> >>  #3 [10bd20cfd60] gfs_glock_nq_m at ffffffffa01860ce
> >>  #4 [10bd20cfda0] gfs_unlink at ffffffffa019ce41
> >>  #5 [10bd20cfea0] vfs_unlink at ffffffff801889fa
> >>  #6 [10bd20cfed0] sys_unlink at ffffffff80188b19
> >>  #7 [10bd20cff30] filp_close at ffffffff80178e48
> >>  #8 [10bd20cff50] error_exit at ffffffff80110d91
> >>     RIP: 0000002a9593f649  RSP: 0000007fbfffbca0  RFLAGS: 00010206
> >>     RAX: 0000000000000057  RBX: ffffffff8011026a  RCX: 0000002a9cc9c870
> >>     RDX: 0000002ae5989000  RSI: 0000002a962fa3a8  RDI: 0000002ae5989000
> >>     RBP: 0000000000000000   R8: 0000002a9630abb0   R9: 0000000000000ffc
> >>     R10: 0000002a9630abc0  R11: 0000000000000206  R12: 0000000040115700
> >>     R13: 0000002ae23294b0  R14: 0000007fbfffc300  R15: 0000002ae5989000
> >>     ORIG_RAX: 0000000000000057  CS: 0033  SS: 002b
> >>
> >>>> a lockdump analysis with the decipher_lockstate_dump and
> >>>> parse_lockdump
> >>>> shows the following output (The whole file is too large for the
> >>>> mailing-list):
> >>>>
> >>>> Entries:  101939
> >>>> Glocks:  60112
> >>>> PIDs:  751
> >>>>
> >>>> 4 chain:
> >>>> lockdump.node1.dec Glock (inode[2], 1114343)
> >>>>   gl_flags = lock[1]
> >>>>   gl_count = 5
> >>>>   gl_state = shared[3]
> >>>>   req_gh = yes
> >>>>   req_bh = yes
> >>>>   lvb_count = 0
> >>>>   object = yes
> >>>>   new_le = no
> >>>>   incore_le = no
> >>>>   reclaim = no
> >>>>   aspace = 1
> >>>>   ail_bufs = no
> >>>>   Request
> >>>>     owner = 5856
> >>>>     gh_state = exclusive[1]
> >>>>     gh_flags = try[0] local_excl[5] async[6]
> >>>>     error = 0
> >>>>     gh_iflags = promote[1]
> >>>>   Waiter3
> >>>>     owner = 5856
> >>>>     gh_state = exclusive[1]
> >>>>     gh_flags = try[0] local_excl[5] async[6]
> >>>>     error = 0
> >>>>     gh_iflags = promote[1]
> >>>>   Inode: busy
> >>>> lockdump.node2.dec Glock (inode[2], 1114343)
> >>>>   gl_flags =
> >>>>   gl_count = 2
> >>>>   gl_state = unlocked[0]
> >>>>   req_gh = no
> >>>>   req_bh = no
> >>>>   lvb_count = 0
> >>>>   object = yes
> >>>>   new_le = no
> >>>>   incore_le = no
> >>>>   reclaim = no
> >>>>   aspace = 0
> >>>>   ail_bufs = no
> >>>>   Inode:
> >>>>     num = 1114343/1114343
> >>>>     type = regular[1]
> >>>>     i_count = 1
> >>>>     i_flags =
> >>>>     vnode = yes
> >>>> lockdump.node1.dec Glock (inode[2], 627732)
> >>>>   gl_flags = dirty[5]
> >>>>   gl_count = 379
> >>>>   gl_state = exclusive[1]
> >>>>   req_gh = no
> >>>>   req_bh = no
> >>>>   lvb_count = 0
> >>>>   object = yes
> >>>>   new_le = no
> >>>>   incore_le = no
> >>>>   reclaim = no
> >>>>   aspace = 58
> >>>>   ail_bufs = no
> >>>>   Holder
> >>>>     owner = 5856
> >>>>     gh_state = exclusive[1]
> >>>>     gh_flags = try[0] local_excl[5] async[6]
> >>>>     error = 0
> >>>>     gh_iflags = promote[1] holder[6] first[7]
> >>>>   Waiter2
> >>>>     owner = none[-1]
> >>>>     gh_state = shared[3]
> >>>>     gh_flags = try[0]
> >>>>     error = 0
> >>>>     gh_iflags = demote[2] alloced[4] dealloc[5]
> >>>>   Waiter3
> >>>>     owner = 32753
> >>>>     gh_state = shared[3]
> >>>>     gh_flags = any[3]
> >>>>     error = 0
> >>>>     gh_iflags = promote[1]
> >>>>   [...loads of Waiter3 entries...]
> >>>>   Waiter3
> >>>>     owner = 4566
> >>>>     gh_state = shared[3]
> >>>>     gh_flags = any[3]
> >>>>     error = 0
> >>>>     gh_iflags = promote[1]
> >>>>   Inode: busy
> >>>> lockdump.node2.dec Glock (inode[2], 627732)
> >>>>   gl_flags = lock[1]
> >>>>   gl_count = 375
> >>>>   gl_state = unlocked[0]
> >>>>   req_gh = yes
> >>>>   req_bh = yes
> >>>>   lvb_count = 0
> >>>>   object = yes
> >>>>   new_le = no
> >>>>   incore_le = no
> >>>>   reclaim = no
> >>>>   aspace = 0
> >>>>   ail_bufs = no
> >>>>   Request
> >>>>     owner = 20187
> >>>>     gh_state = shared[3]
> >>>>     gh_flags = any[3]
> >>>>     error = 0
> >>>>     gh_iflags = promote[1]
> >>>>   Waiter3
> >>>>     owner = 20187
> >>>>     gh_state = shared[3]
> >>>>     gh_flags = any[3]
> >>>>     error = 0
> >>>>     gh_iflags = promote[1]
> >>>>   [...loads of Waiter3 entries...]
> >>>>   Waiter3
> >>>>     owner = 10460
> >>>>     gh_state = shared[3]
> >>>>     gh_flags = any[3]
> >>>>     error = 0
> >>>>     gh_iflags = promote[1]
> >>>>   Inode: busy
> >>>> 2 requests
> >>>
> >>> --

-- 
Gruss / Regards,

Mark Hlawatschek
http://www.atix.de/               http://www.open-sharedroot.org/

** Visit us at CeBIT 2007 in Hannover/Germany **
** in Hall 5, Booth G48/2  (15.-21. of March) **

**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany




More information about the Linux-cluster mailing list