[Linux-cluster] DLM user API for blocking AST
teigland at redhat.com
Mon Dec 19 17:59:42 UTC 2016
On Sun, Dec 18, 2016 at 08:42:49PM +0100, Jean-Marc Saffroy wrote:
> Continuing with my experiments with the DLM user API, I am trying to use
> blocking AST callbacks, and find that the rules for the lifetime and
> ownership of the dlm_lksb struct are a bit surprising. This led me to some
> investigations, and the question at the end of this email.
Hi, you're discovering just how old and crusty this userland interface is.
Sorry about that :)
This userland interface is left over from the earliest dlm implementation,
which could generously be called experimental. It has long needed a
thorough redesign, but because the dlm is not heavily used from userland,
and because user/kernel interfaces are hard, it's never been done.
> It looks like the kernel remembers the pointer to the lksb struct used to
> issue the dlm_lock call, and libdlm happily overwrites this piece of
> memory whenever the kernel issues an event related to that lock, including
> just before firing a BAST callback. It is a bit frustrating because I got
> caught by surprise wondering why something was smashing my stack, ie. the
> place where I had once laid out my dlm_lksb, thinking that it was okay to
> release its memory after the completion AST callback has completed.
The way the kernel saves and restores the pointers is very unpleasant, and
handling lifetimes of lock structs/memory a big pain.
In a quick look at this code, I'm not seeing any simple or obvious ways to
avoid the lksb behavior you're describing, but it's been a while since I
was very familiar with this area. With more study, it's possible that a
fix could be found, but it seems a bit unlikely.
As a workaround to avoid an unwanted bast callback after a completion, I
wonder if you could make a no-op call with NULL astaddr/astarg to prevent
any further callback using those?
> For now I have (apparently) working test code that deals with this in the
> following way: for a given lock (identified by its lockid), I keep two
> dlm_lksb structs and a bit indicating which of the two is free to use for
> conversions. I update the bit every time the CAST (not BAST) callback
> completes, thus doing some kind of double buffering.
OK, I don't know enough about the details to say whether there are any
subtle issues with this or not.
> So I assume that:
> - each lock acquisition or conversion call gives ownership of the lksb to
> the kernel and libdlm (because a BAST callback can fire at any time and
> will overwrite the struct), causing the kernel/libdlm to forget about the
> previously owned lksb (meaning the caller can/should then dispose of it)
That sounds about right.
> - AST and BAST callbacks run in order, such that after the CAST completes,
> and until a conversion occurs, a BAST firing will only overwrite the lksb
> given on the last lock or conversion
> Are my assumptions correct?
That also sounds like it should be true.
To say with more certainty would require closer study of the code, because
whatever rules exist are a function of the current implemention, and not
derived from higher design rules per se.
More information about the Linux-cluster