[dm-devel] [git pull] device mapper changes for 4.18

Mon Jun 4 19:29:21 UTC 2018

On Mon, Jun 4, 2018 at 12:09 PM Mike Snitzer <snitzer at redhat.com> wrote:
>
> Mikulas elected to use swait because of the very low latency nature of
> layering ontop of persistent memory.  Use of "simple waitqueues"
> _seemed_ logical to me.

I know. It's actually the main reason I have an almost irrational
hatred of those interfaces. They _look_ so simple and obvious, and
they are very tempting to use. And then they have that very subtle
issue that the default wakeup is exclusive.

I've actually wanted to remove them entirely, but there are two
existing users (kvm and rcu), and the RCU one actually is a good user.
The kvm one is completely pointless, but I haven't had the energy to
just change it to use a direct task pointer, and I was hoping the kvm
people would do that themselves (because it should be both faster and
simpler than swait).

One option might be to rename them to be less tempting. Instead of
"swait" where the "s" stands for "simple" (which it isn't, because the
complexity is in the subtle semantics), we could perhaps write it out
as "specialized_wait". Make people actually write that "specialized"
word out, and maybe they'd have to be aware of just how subtle the
differences are to normal wait-queues.

Because those functions *are* smaller and can definitely be faster and
have lower latencies. So in *theory* they are perfectly fine, it's
just that they need a *lot* of careful thinking about before  you use
them.

So the rules with swake lists are that you either have to

 (a) use "swake_up_all()" to wake up everybody

 (b) be *very* careful and guarantee that every single place that
sleeps on an swait queue will actually consume the resource that it
was waiting on - or wake up the next sleeper.

and usually people absolutely don't want to do (a), and then they get (b) wrong.

And when you get (b) wrong, you can end up with processes stuck
waiting on things even after they got released. But in *practice* it
almost never actually happens, particularly if you have some array of
resources - like that freelist - where once somebody gets a resource,
they'll do another wakeup when they release it, so if you have lots of
threads that fight for the resource, you'll also end up with lots of
wakeups. Even if some thread ends up being blocked when there are free
resources, _another_ thread will come in, pick up one of those free
resources, and then wake up the incorrectly blocked one when it is
done.

So it's actually really hard to see the bug in practice. you have to
have really bad luck to first hit that "don't wake up the next waiter,
because the waiter that you _did_ wake didn't need the resource after
all", and then you also have to stop allocating (and freeing) other
copies of that resource.

So the common case is that you never really see the problem as a
deadlock, but you _can_ see it as an odd blip that basically is "stop
handling requests for one thread, until another thread comes in and
starts doing requests, which then restarts the first thread".

And don't get me wrong - you can get the exact same problem with
regular wait-queues too, but then you have to explicitly say "I'm an
exclusive waiter" and violate the rules for exclusivity.  We've had
that, but then I blame the user, not the wait-queue interface itself.

             Linus