[Linux-cluster] gfs deadlock situation
David Teigland
teigland at redhat.com
Wed Feb 14 15:59:57 UTC 2007
> node1:
> Resource 0000010001218088 (parent 0000000000000000). Name (len=24) " 2
> 1100e7"
> Local Copy, Master is node 2
> Granted Queue
> Conversion Queue
> Waiting Queue
> 5eb00178 PR (EX) Master: 3eeb0117 LQ: 0,0x5
> node2:
> Resource 00000107e462c8c8 (parent 0000000000000000). Name (len=24) " 2
> 1100e7"
> Master Copy
> Granted Queue
> 3eeb0117 PR Remote: 1 5eb00178
> Conversion Queue
> Waiting Queue
The state of the lock on node1 looks bad. I'm studying the code and
struggling to understand how it could possibly arrive in that state.
Some things to notice:
- the lock is converting, it should be on the Conversion Queue, not the
Waiting Queue
- lockqueue_state is 0, so either node1 has not sent a remote request to
node2 at all, or node1 did send something and already received some kind
of reply so it's not waiting for a reply any longer
- the state of the lock on node2 looks normal
Did you check for suspicious syslog messages on both nodes? Did any nodes
on this fs mount, unmount or fail around the time this happened? Has this
happened before? If you'd like to try to reproduce this with some dlm
debugging I could send you a patch (although this is such an odd state I'm
not sure yet where I'd begin to add debugging.)
Dave
More information about the Linux-cluster
mailing list