[Linux-cluster] lm_dlm_cancel during quota operations
Robert Clark
cluster at defuturo.co.uk
Fri Apr 20 15:31:17 UTC 2007
I have a script which runs gfs_quota to set quotas for all the users
on my GFS filesystem. When it's run simultaneously on two nodes, errors
like the following begin to appear:
lock_dlm: lm_dlm_cancel 2,34 flags 80
lock_dlm: lm_dlm_cancel rv 0 2,34 flags 40080
lock_dlm: complete dlm cancel 2,34 flags 40000
...
lock_dlm: lm_dlm_cancel 2,34 flags 80
lock_dlm: complete dlm cancel 2,34 flags 40000
lock_dlm: lm_dlm_cancel rv 0 2,34 flags 80
...
lock_dlm: lm_dlm_cancel 2,34 flags 84
lock_dlm: lm_dlm_cancel skip 2,34 flags 84
...
lock_dlm: lm_dlm_cancel 2,34 flags 80
dlm: cancel granted 1350055
lock_dlm: lm_dlm_cancel rv 0 2,34 flags 40000
lock_dlm: extra completion 2,34 5,5 id 1350055 flags 40000
and, more rarely:
lock_dlm: lm_dlm_cancel 2,34 flags 80
lock_dlm: lm_dlm_cancel rv 0 2,34 flags 40080
dlm: desktop-home-1: cancel reply ret -22
lock_dlm: ast sb_status -22 2,34 flags 40000
...
lock_dlm: lm_dlm_cancel 2,34 flags 80
lock_dlm: lm_dlm_cancel rv -16 2,34 flags 40080
At the same time, I/O to the GFS partition hangs. Rebooting one of the
two nodes allows the cluster to recover.
On my smaller test cluster, I've been able to reproduce some of the
errors:
lock_dlm: lm_dlm_cancel 2,18 flags 84
lock_dlm: lm_dlm_cancel rv 0 2,18 flags 40080
lock_dlm: complete dlm cancel 2,18 flags 40000
...
lock_dlm: lm_dlm_cancel 2,18 flags 80
lock_dlm: lm_dlm_cancel skip 2,18 flags 0
though not the I/O hangs.
My shared storage is over AoE and I'm using the following packages:
GFS-6.1.6-1
dlm-1.0.1-1
cman-1.0.11-0
GFS-kernel-hugemem-2.6.9-60.9
dlm-kernel-hugemem-2.6.9-44.9
cman-kernel-hugemem-2.6.9-45.15
kernel-hugemem-2.6.9-42.0.10.EL
I must admit, I've not been able to find out much about what dlm
cancels are or what triggers them. Can anyone shed some light on this?
Robert
More information about the Linux-cluster
mailing list