[Linux-cluster] Directory lockups?

Lazar Obradovic laza at yu.net
Wed Sep 15 20:55:02 UTC 2004


It happened again today, and I got around 80 queud processes waiting to
write into same file. All processes were in "D" state when looked from
'ps' and they all blocked whole directory where file is (ls into that
dir would block too). 

Now, node just recovered itself, but that directory was unavailable for
almost an hour and a half!

Do deadlocktimeout and lock_timeout (in /proc/cluster/config/dlm) have
anything to do with this and are they configurable? 

Can someone shed a light on /proc interface, just to know what's where?
This could also go into usage.txt or even separate file... 

> Hello, 
> 
> I have been receieving this messages lately: 
> 
> Sep 15 08:16:02 test01 kernel: dlm: locks: dir entry exists 6bd5037b fr 5 r 0        7         60300b6
> Sep 15 08:53:29 test01 kernel: dlm: locks: dir entry exists abef0134 fr 3 r 0        7         7382d53
> Sep 15 10:42:18 test01 kernel: dlm: locks: dir entry exists d00d7 fr 4 r 0        7         52565b8
> Sep 15 10:42:18 test01 kernel: dlm: locks: dir entry exists f0012 fr 4 r 0        7         52565b8
> Sep 15 10:46:00 test01 kernel: dlm: locks: dir entry exists f0302 fr 6 r 0        7         5356fea
> Sep 15 11:10:32 test01 kernel: dlm: locks: dir entry exists 420026 fr 4 r 0        7         2ef1081
> Sep 15 11:48:01 test01 kernel: dlm: locks: dir entry exists 30282 fr 5 r 0        7         56a3afd
> 
> it seems that it has to do with a particular directory. Every read or
> write attempt inside that directory gets blocked and, since processess
> queue up, load rises 'till it kills the node. 
> 
> Now, I've been searching through logs and havent found anything useful,
> except those few lines up there.
> 
> I'v run gfs_fsck and got: 
> 
> [... useless things omitted ... ]
> 
> Dinodes with more than one dirent:
>        inode = 30926818, dirents = 2
> Dinodes with link count > 1:
>        inode = 30926818, nlink = 2
> Pass 6:  done  (0:00:00)
> 
> [... useless things omitted ... ]
> 
> What shall I do to debug this further? Can  anyone explain why is this
> happening? 
-- 
Lazar Obradovic, System Engineer
----- 
laza at YU.net
YUnet International http://www.EUnet.yu
Dubrovacka 35/III, 11000 Belgrade
Tel: +381 11 3119901; Fax: +381 11 3119901
-----
This e-mail is confidential and intended only for the recipient.
Unauthorized distribution, modification or disclosure of its
contents is prohibited. If you have received this e-mail in error,
please notify the sender by telephone +381 11 3119901.
-----




More information about the Linux-cluster mailing list