[Linux-cluster] gfs deadlock situation
Mark Hlawatschek
hlawatschek at atix.de
Tue Feb 13 12:37:18 UTC 2007
Hi,
we have the following deadlock situation:
2 node cluster consisting of node1 and node2.
/usr/local is placed on a GFS filesystem mounted on both nodes.
Lockmanager is dlm.
We are using RHEL4u4
a strace to ls -l /usr/local/swadmin/mnx/xml ends up in
lstat("/usr/local/swadmin/mnx/xml",
This happens on both cluster nodes.
All processes trying to access the directory /usr/local/swadmin/mnx/xml are
in "Waiting for IO (D)" state. I.e. system load is at about 400 ;-)
Any ideas ?
a lockdump analysis with the decipher_lockstate_dump and parse_lockdump shows
the following output (The whole file is too large for the mailing-list):
Entries: 101939
Glocks: 60112
PIDs: 751
4 chain:
lockdump.node1.dec Glock (inode[2], 1114343)
gl_flags = lock[1]
gl_count = 5
gl_state = shared[3]
req_gh = yes
req_bh = yes
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 1
ail_bufs = no
Request
owner = 5856
gh_state = exclusive[1]
gh_flags = try[0] local_excl[5] async[6]
error = 0
gh_iflags = promote[1]
Waiter3
owner = 5856
gh_state = exclusive[1]
gh_flags = try[0] local_excl[5] async[6]
error = 0
gh_iflags = promote[1]
Inode: busy
lockdump.node2.dec Glock (inode[2], 1114343)
gl_flags =
gl_count = 2
gl_state = unlocked[0]
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 1114343/1114343
type = regular[1]
i_count = 1
i_flags =
vnode = yes
lockdump.node1.dec Glock (inode[2], 627732)
gl_flags = dirty[5]
gl_count = 379
gl_state = exclusive[1]
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 58
ail_bufs = no
Holder
owner = 5856
gh_state = exclusive[1]
gh_flags = try[0] local_excl[5] async[6]
error = 0
gh_iflags = promote[1] holder[6] first[7]
Waiter2
owner = none[-1]
gh_state = shared[3]
gh_flags = try[0]
error = 0
gh_iflags = demote[2] alloced[4] dealloc[5]
Waiter3
owner = 32753
gh_state = shared[3]
gh_flags = any[3]
error = 0
gh_iflags = promote[1]
[...loads of Waiter3 entries...]
Waiter3
owner = 4566
gh_state = shared[3]
gh_flags = any[3]
error = 0
gh_iflags = promote[1]
Inode: busy
lockdump.node2.dec Glock (inode[2], 627732)
gl_flags = lock[1]
gl_count = 375
gl_state = unlocked[0]
req_gh = yes
req_bh = yes
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Request
owner = 20187
gh_state = shared[3]
gh_flags = any[3]
error = 0
gh_iflags = promote[1]
Waiter3
owner = 20187
gh_state = shared[3]
gh_flags = any[3]
error = 0
gh_iflags = promote[1]
[...loads of Waiter3 entries...]
Waiter3
owner = 10460
gh_state = shared[3]
gh_flags = any[3]
error = 0
gh_iflags = promote[1]
Inode: busy
2 requests
--
Gruss / Regards,
Mark Hlawatschek
http://www.atix.de/ http://www.open-sharedroot.org/
** Visit us at CeBIT 2007 in Hannover/Germany **
** in Hall 5, Booth G48/2 (15.-21. of March) **
**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany
More information about the Linux-cluster
mailing list