[Linux-cluster] DLM user API for blocking AST

Mon Dec 19 17:59:42 UTC 2016

On Sun, Dec 18, 2016 at 08:42:49PM +0100, Jean-Marc Saffroy wrote:
> Hi,
> 
> Continuing with my experiments with the DLM user API, I am trying to use 
> blocking AST callbacks, and find that the rules for the lifetime and 
> ownership of the dlm_lksb struct are a bit surprising. This led me to some 
> investigations, and the question at the end of this email.

Hi, you're discovering just how old and crusty this userland interface is.
Sorry about that :)

This userland interface is left over from the earliest dlm implementation,
which could generously be called experimental.  It has long needed a
thorough redesign, but because the dlm is not heavily used from userland,
and because user/kernel interfaces are hard, it's never been done.

> It looks like the kernel remembers the pointer to the lksb struct used to 
> issue the dlm_lock call, and libdlm happily overwrites this piece of 
> memory whenever the kernel issues an event related to that lock, including 
> just before firing a BAST callback. It is a bit frustrating because I got 
> caught by surprise wondering why something was smashing my stack, ie. the 
> place where I had once laid out my dlm_lksb, thinking that it was okay to 
> release its memory after the completion AST callback has completed.

The way the kernel saves and restores the pointers is very unpleasant, and
handling lifetimes of lock structs/memory a big pain.

In a quick look at this code, I'm not seeing any simple or obvious ways to
avoid the lksb behavior you're describing, but it's been a while since I
was very familiar with this area.  With more study, it's possible that a
fix could be found, but it seems a bit unlikely.

As a workaround to avoid an unwanted bast callback after a completion, I
wonder if you could make a no-op call with NULL astaddr/astarg to prevent
any further callback using those?

> For now I have (apparently) working test code that deals with this in the 
> following way: for a given lock (identified by its lockid), I keep two 
> dlm_lksb structs and a bit indicating which of the two is free to use for 
> conversions. I update the bit every time the CAST (not BAST) callback 
> completes, thus doing some kind of double buffering.

OK, I don't know enough about the details to say whether there are any
subtle issues with this or not.

> So I assume that:
> 
> - each lock acquisition or conversion call gives ownership of the lksb to 
> the kernel and libdlm (because a BAST callback can fire at any time and 
> will overwrite the struct), causing the kernel/libdlm to forget about the 
> previously owned lksb (meaning the caller can/should then dispose of it)

That sounds about right.

> - AST and BAST callbacks run in order, such that after the CAST completes, 
> and until a conversion occurs, a BAST firing will only overwrite the lksb 
> given on the last lock or conversion
> 
> Are my assumptions correct?

That also sounds like it should be true.

To say with more certainty would require closer study of the code, because
whatever rules exist are a function of the current implemention, and not
derived from higher design rules per se.

Dave