[Linux-cluster] Deadlock detection in libdlm

Tue Jan 25 22:19:48 UTC 2011

On Tue, Jan 25, 2011 at 08:01:00PM +0000, Steve Little wrote:
> I've been trying to make use of deadlock detection in libdlm, but
> without any luck so far. I'm hoping someone can tell me what I'm doing
> wrong, or how to debug this further.

The dlm detects *conversion* deadlocks on a single resource and returns
EDEADLK for them.

> This should cause a classic deadlock: process 1 is waiting on resource
> A, which is locked by process 2. Process 2 is waiting on resource B,
> which is locked by process 1.

A "classic" multi-resource deadlock is not detected.

Google came up with this nice description of the difference:
http://books.google.com/books?id=ydKIsgCiFVsC&pg=PA143&lpg=PA143&dq=conversion+deadlock&source=bl&ots=LSJEUQU3HI&sig=eo4UhF9sR474OvQ1Nbeid2iHTOI&hl=en&ei=6kk_TZOeNImycdLyidEB&sa=X&oi=book_result&ct=result&resnum=6&ved=0CDkQ6AEwBTgK#v=onepage&q=conversion%20deadlock&f=false

The dlm also does lock timeouts which could be used to approximate
deadlock detection/resolution.

I wrote a "toy" proof of concept for full deadlock detection once.  The
code still exists in dlm_controld, I'm not sure if the sufficient flags
exist in the API to enable and play with it any more (that's about all
it's good for.)

>        EDEADLOCK       The lock operation is causing a deadlock and has been
>                        cancelled. If this was a conversion then the lock is
>                        reverted to its previously granted state. If it was a
>                        new lock then it has not been granted. (NB Only
>                        conversion deadlocks are currently detected)"

It does note the limitation.

Dave