[Linux-cluster] 2 node rm hang more info - dlm hang?

David Teigland teigland at redhat.com
Mon Dec 13 04:56:42 UTC 2004


> cl032.ld.decipher Glock (rgrp[3], 17)

> Resource d6e2a5cc (parent 00000000). Name (len=24) "       3              11"
> Local Copy, Master is node 3
> Granted Queue
> 0022031d NL Master:     001b004c
> Conversion Queue
> Waiting Queue
> 0036020c -- (EX) Master:     00330164  LQ: 0,0x8
  
> Is there an easy way to know which resource name matches with
> glock?

These are the same, gfs prints as 3,17; dlm prints in hex 3,11


> AFAIKT, the glock is waiting for the unlock to happen.
> The DLM (if this is the matching dlm lock) is NL waiting
> to grant to EX, but it not doing it.
> 
> Thoughts?  Is my analysis correct?

cl030 already has a NL lock granted and is requesting a second lock (EX).
To get the full dlm picture you need to look at 3,11 on both nodes:

cl030 (nodeid 1)
----------------

Resource d6e2a5cc (parent 00000000). Name (len=24) "       3              11"
Local Copy, Master is node 3
Granted Queue
0022031d NL Master:     001b004c
Conversion Queue
Waiting Queue
0036020c -- (EX) Master:     00330164  LQ: 0,0x8


cl032 (nodeid 3)
----------------

Resource ddac08e4 (parent 00000000). Name (len=24) "       3              11"  
Master Copy
LVB: 01 16 19 70 00 00 10 28 00 00 42 1f 00 00 00 00 
     00 00 00 a1 00 00 10 8d 00 00 00 00 00 00 00 00 
Granted Queue
002b014b EX
001b004c NL Remote:   1 0022031d
001d027b NL
Conversion Queue
Waiting Queue
003300be -- (EX) Remote:   1 002d009b  LQ: 0,0x8


I don't see why the remote lock id's aren't correct for cl030's EX lock:
cl030_lkid 0036020c != cl032_remote_lkid 002d009b
cl032_lkid 003300be != cl030 remote_lkid 00330164

Compare with cl030's NL lock for which the lkid's are correct.  For
something that basic there must be something really obvious I'm missing.

-- 
Dave Teigland  <teigland at redhat.com>




More information about the Linux-cluster mailing list