[Linux-cluster] GFS2 processes getting stuck in WCHAN=dlm_posix_lock

Mon Nov 2 21:44:15 UTC 2009

Hi Again,

On 11/02/2009 06:42 AM, Steven Whitehouse wrote:
> Hi,
>
> On Fri, 2009-10-30 at 19:27 -0400, Allen Belletti wrote:
>    
>> Hi All,
>>
>> As I've mentioned before, I'm running a two-node clustered mail server
>> on GFS2 (with RHEL 5.4)  Nearly all of the time, everything works
>> great.  However, going all the way back to GFS1 on RHEL 5.1 (I think it
>> was), I've had occasional locking problems that force a reboot of one or
>> both cluster nodes.  Lately I've paid closer attention since it's been
>> happening more often.
>>
>> I'll notice the problem when the load average starts rising.  It's
>> always tied to "stuck" processes, and I believe always tied to IMAP
>> clients (I'm running Dovecot.)  It seems like a file belonging to user
>> "x" (in this case, "jforrest" will become locked in some way, such that
>> every IMAP process tied that user will get stuck on the same thing.
>> Over time, as the user keeps trying to read that file, more&  more
>> processes accumulate.  They're always in state "D" (uninterruptible
>> sleep), and always on "dlm_posix_lock" according to WCHAN.  The only way
>> I'm able to get out of this state is to reboot.  If I let it persist for
>> too long, I/O generally stops entirely.
>>
>> This certainly seems like it ought to have a definite solution, but I've
>> no idea what it is.  I've tried a variety of things using "find" to
>> pinpoint a particular file, but everything belonging to the affected
>> user seems just fine.  At least, I can read and copy all of the files,
>> and do a stat via ls -l.
>>
>> Is it possible that this is a bug, not within GFS at all, but within
>> Dovecot IMAP?
>>
>> Any thoughts would be appreciated.  It's been getting worse lately and
>> thus no fun at all.
>>
>> Cheers,
>> Allen
>>
>>      
> Do you know if dovecot IMAP uses signals at all? That would be the first
> thing that I'd look at. The other thing to check is whether it makes use
> of F_GETLK and in particular the l_pid field? strace should be able to
> answer both of those questions (except the l_pid field of course, but
> the chances are it it calls F_GETLK and then sends a signal, its also
> using the l_pid field),
>
> Steve.
>    
I've been looking into how Dovecot IMAP works and I see now that no 
"locking" in the OS sense of the word is involved for Maildir access.  
Instead, one particular index per mail folder is "locked" by creating a 
<filename>.lock entry, performing the necessary operations, and then 
deleting the file.  In the case of certain users with hundreds of 
folders and mail clients which scan all of them, this potentially 
results in hundreds of rapid create/delete operations.  The relevant 
text from the Dovecot documentation is as follows:

> Although maildir was designed to be lockless, Dovecot locks the 
> maildir while
> doing modifications to it or while looking for new messages in it. This is
> required because otherwise Dovecot might temporarily see mails incorrectly
> deleted, which would cause trouble. Basically the problem is that if one
> process modifies the maildir (eg. a rename() to change a message's flag),
> another process in the middle of listing files at the same time could 
> skip a
> file. The skipping happens because readdir() system call doesn't 
> guarantee that
> all the files are returned if the directory is modified between the 
> calls to
> it. This problem exists with all the commonly used filesystems.
>
> Because Dovecot uses its own non-standard locking ('dovecot-uidlist.lock'
> dotlock file), other MUAs accessing the maildir don't support it. This 
> means
> that if another MUA is updating messages' flags or expunging messages, 
> Dovecot
> might temporarily lose some message. After the next sync when it finds it
> again, an error message may be written to log and the message will 
> receive a
> new UID.
Does GFS2 have the limitation that's being described for readdir()?  I 
would expect so, but perhaps the work necessary to ensure a consistent 
view between cluster node has the side effect of correcting this issue 
as well.  In any case, the number of times that my users would actually 
encounter the issue being protected against might be so rare that I can 
safely disable the locking mechanism regardless.

Any thoughts on this would be appreciated.  Would this sequence of 
operations cause the WCHAN=dlm_posix_lock condition for brief periods of 
time in normal operation?  Wish I could dig through the kernel & gfs2 
code to figure this out for myself but it would crush my productivity at 
work :-)

Cheers,
Allen

-- 
Allen Belletti
allen at isye.gatech.edu                             404-894-6221 Phone
Industrial and Systems Engineering                404-385-2988 Fax
Georgia Institute of Technology