[Linux-cluster] GFS2 processes getting stuck in WCHAN=dlm_posix_lock

Allen Belletti allen at isye.gatech.edu
Fri Oct 30 23:27:23 UTC 2009


Hi All,

As I've mentioned before, I'm running a two-node clustered mail server 
on GFS2 (with RHEL 5.4)  Nearly all of the time, everything works 
great.  However, going all the way back to GFS1 on RHEL 5.1 (I think it 
was), I've had occasional locking problems that force a reboot of one or 
both cluster nodes.  Lately I've paid closer attention since it's been 
happening more often.

I'll notice the problem when the load average starts rising.  It's 
always tied to "stuck" processes, and I believe always tied to IMAP 
clients (I'm running Dovecot.)  It seems like a file belonging to user 
"x" (in this case, "jforrest" will become locked in some way, such that 
every IMAP process tied that user will get stuck on the same thing.  
Over time, as the user keeps trying to read that file, more & more 
processes accumulate.  They're always in state "D" (uninterruptible 
sleep), and always on "dlm_posix_lock" according to WCHAN.  The only way 
I'm able to get out of this state is to reboot.  If I let it persist for 
too long, I/O generally stops entirely.

This certainly seems like it ought to have a definite solution, but I've 
no idea what it is.  I've tried a variety of things using "find" to 
pinpoint a particular file, but everything belonging to the affected 
user seems just fine.  At least, I can read and copy all of the files, 
and do a stat via ls -l.

Is it possible that this is a bug, not within GFS at all, but within 
Dovecot IMAP?

Any thoughts would be appreciated.  It's been getting worse lately and 
thus no fun at all.

Cheers,
Allen




More information about the Linux-cluster mailing list