[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Poor condvar performance

This is a bit of a stream of consciousness.... don't assume coherence ;)

Ulrich Drepper wrote:
> Jamie Lokier wrote:
> > cond_signal() would run through the list, calling
> > futex_requeue2(entry_addr, 1, infinity) for each entry.
> Why such a list?  Lists at userlevel are bad.  One of the aspects of the
> current kernel interface which makes the userlevel code so robust is
> that we don't have any lists except the list of all threads (or stacks).
>  No lists for waiters, this was the big nightmare in LinuxThreads.

I agree, a list is best avoided, particularly as this one would have
to be maintained in shared memory, ew.

But without a list, how do you know what value of nr_wake to pass to
FUTEX_REQUEUE2 for broadcast?

If all the waiters are on the same mutex, you want to wake one and
requeue the rest.  If they are different mutexes, you want to take
as many as there are separate mutexes.

Another problem is that FUTEX_REQUEUE2 is harder to implement if there
are different waiting mutex addresses.  More hash lists than 2 are
involved, which could make the kernel implementation quite hairy
(because of the hashed spinlocks), unless we say that the requeue
operation is no longer atomic and can be done as dequeue followed by

On the other hand, getting rid of the list makes it much easier to
ensure waiters are woken in FIFO order.  To do FIFO with the list
would require two intertwined lists in userspace, and more
FUTEX_REQUEUE2 calls in some cases.

> If the FUTEX_WAIT2 call would take the extra parameter, the value can be
> stored somewhere on the thread's stack in the kernel (local variable).
> Then if and only if the FUTEX_REQUEUE2 call is used, it is used in the
> requeue operation.  The caller of FUTEX_REQUEUE2 need not know the
> address.  It can be relied on that the waiter knows it.


Another problem is the "waiter bit" which Alexander Terekhov mentions.
If we requeue any, we need the mutex state to reflect that it has
waiters, and if we didn't, to reflect that, otherwise when the holder
of the mutex eventually releases it it will have to call futex_wake

At the moment, futex_requeue returns the number of requeued waiters,
so the caller of futex_requeue can be responsible for setting the
mutex state so as to indicate waiters or not.  But that isn't possible
for pshared, because the caller of futex_requeue cannot access the
waiter mutex word.

These two problems suggest that FUTEX_REQUEUE2 should not be passed
separate nr_wake and nr_requeue values, but instead should modify each
waiter's mutex word, and decide whether to wake or requeue based on
the value.  But that causes all sorts of problems.  (a) It has to walk
the page table, to access another task's memory for private mutexes,
and (b) we lose the genericity that we have at the moment, where
futexes don't know anything about the bit representation.

So, forget that.

Another way to solve the problem of how many to wake and how many to
requeue is for FUTEX_REQUEUE2 to keep track of _distinct_ requeue
targets itself, and wake nr_wake for each distinct requeue target.
This seems like it could work and doesn't require NPTL to know
anything special about addresses.

Another way to solve the "waiter bit" problem is for FUTEX_WAIT2 to
return 0 or 1, depending on whether anyone else was requeued when it
was woken.  Then the waiter can set the mutex state according to
whether there any waiters.

> > This isn't a great interface because you can't requeue2 to move a
> > waiter a second time.  But it is fine for pshared condvars, I think.
> It's actually usable for all condvars, not only shared.  This is
> important, I really don't want to have too much difference in the code
> paths for condvars.

Understood, but there is a problem: 

> I've pretty sure there will be more interesting primitives we can think
> of over time.

One thing that would be very easy: currently futex_wait returns 0 if
woken.  We could change that so a parameter passed to futex_wake or
futex_requeue is the value to return from futex_wait.  That opens all
sorts of token-passing possibilities.  Have a think about whether you
would find that useful.

> If you can come up with a patch we can easily check the performance.

Have a read over my ramblings in here... I am half asleep right now
;)... think them over and decide which strategy will actually work.
Then we can look at doing a patch.

-- Jamie

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]