[dm-devel] [git pull] device mapper changes for 4.18

Mon Jun 4 22:16:31 UTC 2018

On Mon, Jun 4, 2018 at 2:53 PM Mikulas Patocka <mpatocka at redhat.com> wrote:
>
> I'd be interested - does the kernel deal properly with spurious wake-up? -
> i.e. suppose that the kernel thread that I created is doing simething else
> in a completely different subsystem - can I call wake_up_process on it?
> Could it confuse some unrelated code?

We've always had that issue, and yes, we should handle it fine. Code
that doesn't handle it fine is broken, but I don't think we've ever
had that situation.

For example, a common case of "spurious" wakeups is when somebody adds
itself to a wait list, but then ends up doing other things (including
taking page faults because of user access etc). The wait-list is still
active, and events on the wait list will still wake people up, even if
they are sleeping on some *other* list too.

In fact, an example of spurious wakeups comes from just using regular
futexes. We send those locklessly, and you actually can get a futex
wakeup *after* you thought you removed yourself from the futex queue.

But that's actually only an example of the much more generic issue -
we've always supported having multiple sources of wakeups, so
"spurious" wakups have always been a thing.

People are probably not so aware of it, because they've never been an
actual _problem_.

Why? Our sleep/wake model has never been that "I woke up, so what I
waited on must be done". Our sleep/wake model has always been one
where being woken up just means that you go back and repeat the
checks.

The whole "wait_event()" loop being the most core example of that
model, but that's actually not the *traditional* model. Our really
traditional model of waiting for something actually predates
wait_event(), and is an explicit loop like

    add_to_wait_queue(..);
    for (;;) {
        set_task_state(TASK_INTERRUPTIBLE);
        .. see if we need to sleep, exit if ok ..
        schedule();
    }
    remove_from_wait_queue(..);

so even pretty much from day #1, the whole notion of "spurious wake
events" is a non-issue.

(We did have a legacy "sleep_on()" interface back in the dark ages,
but even that was supposed to be used in a loop).

> The commonly used synchronization primitives recheck the condition after
> wake-up, but it's hard to verify that the whole kernel does it.

See above. We have those spurious wakeups already.

> It looked to me like the standard wait-queues suffers from feature creep
> (three flags, high number of functions abd macros, it even uses an
> indirect call to wake something up) - that's why I used swait.

I agree that the standard wait-queues have gotten much more complex
over the years. But apart from the wait entries being a bit big, they
actually should not perform badly.,

The real problem with wait-queues is that because of their semantics,
you *can* end up walking the whole queue, waking up hundreds (or
thousands) of processes. That can be a latency issue for RT.

But the answer to that tends to be "don't do that then". If you have
wait-queues that can have thousands of entries, there's likely
something seriously wrong somewhere. We've had it, but it's very very
rare.

                        Linus