problems with pthread_cond_broadcast
Scott Lamb
slamb at slamb.org
Thu Apr 15 07:43:01 UTC 2004
On Apr 15, 2004, at 12:50 AM, Thorsten Kukuk wrote:
>
> Hi,
>
> I have a problem with pthread_cond_wait/pthread_cond_broadcast
> waiting sometimes forever on a fast SMP machine. Attached is a
> simple test case.
>
> If I use the order
> pthread_mutex_unlock (&lock);
> pthread_cond_broadcast (&pcond);
>
> with NPTL, the program will hang after a short time running with
> current glibc + NPTL + kernel 2.6.x on all architectures I tested.
>
> If I revert the order to
> pthread_cond_broadcast (&pcond);
> pthread_mutex_unlock (&lock);
>
> it works fine.
>
> Is this a problem of the test case (since pthread_cond_broadcast and
> pthread_cond_wait will access pcond at the same time in different
> threads) or is this a glibc/NPTL/kernel problem?
I think there is a problem in your test case. And I _sort_of_ see it.
Imagine this sequence of events:
Thread 1 Thread 2
----------------------------------------
rw_lock_write: rw_lock_write:
mutex_lock
n_readers = -1
mutex_unlock
mutex_lock
n_readers != 0
rw_unlock_write:
mutex_lock
n_readers = 0
mutex_unlock
cond_broadcast
cond_wait
thread 2 is waiting here when it should be able to grab the rw_lock.
That is wrong and would not happen with BROKEN=0.
But it seems like thread 1 should be able to continue running. (And
eventually, one of its broadcasts would catch another thread at the
right time.) I don't know how this defect in rw_unlock_write could
cause the program you've shown to stop entirely.
Maybe a legitimate expert will show the full picture... or maybe it is
indeed a bug in NPTL.
Scott
More information about the Phil-list
mailing list