[Linux-cluster] Directory lockups?
David Teigland
teigland at redhat.com
Thu Sep 16 05:46:39 UTC 2004
On Wed, Sep 15, 2004 at 10:55:02PM +0200, Lazar Obradovic wrote:
> It happened again today, and I got around 80 queud processes waiting to
> write into same file. All processes were in "D" state when looked from
> 'ps' and they all blocked whole directory where file is (ls into that
> dir would block too).
Is there a test or application you're running that we could try ourselves?
> Now, node just recovered itself, but that directory was unavailable for
> almost an hour and a half!
In addition to Ken's suggestion ("ps aux" and "gfs_tool lockdump
/mountpoint" from each node), you could provide "cat
/proc/cluster/lock_dlm_debug" from each node.
> Do deadlocktimeout and lock_timeout (in /proc/cluster/config/dlm) have
> anything to do with this and are they configurable?
They are unrelated to gfs.
> Can someone shed a light on /proc interface, just to know what's where?
> This could also go into usage.txt or even separate file...
I don't think any of them would be useful. It's simply our habit to
define any "constant" this way.
buffer_size - network message size used by the dlm
dirtbl_size, lkbtbl_size, rsbtbl_size - hash table sizes
lock_timeout - max time we'll wait for a reply for a remote request
(not used for gfs locks)
deadlocktime - max time a request will wait to be granted
(not used for gfs locks)
recover_timer - while waiting for certain conditions during recovery,
this is the interval between checks
tcp_port - used for dlm communication
max_connections - max number of network connections the dlm will make
--
Dave Teigland <teigland at redhat.com>
More information about the Linux-cluster
mailing list