problems with pthread_cond_broadcast

Thu Apr 15 07:43:01 UTC 2004

On Apr 15, 2004, at 12:50 AM, Thorsten Kukuk wrote:

>
> Hi,
>
> I have a problem with pthread_cond_wait/pthread_cond_broadcast
> waiting sometimes forever on a fast SMP machine. Attached is a
> simple test case.
>
> If I use the order
>   pthread_mutex_unlock (&lock);
>   pthread_cond_broadcast (&pcond);
>
> with NPTL, the program will hang after a short time running with
> current glibc + NPTL + kernel 2.6.x on all architectures I tested.
>
> If I revert the order to
>   pthread_cond_broadcast (&pcond);
>   pthread_mutex_unlock (&lock);
>
> it works fine.
>
> Is this a problem of the test case (since pthread_cond_broadcast and
> pthread_cond_wait will access pcond at the same time in different
> threads) or is this a glibc/NPTL/kernel problem?

I think there is a problem in your test case. And I _sort_of_ see it. 
Imagine this sequence of events:

Thread 1            Thread 2
----------------------------------------
rw_lock_write:      rw_lock_write:
   mutex_lock
   n_readers = -1
   mutex_unlock
                       mutex_lock
                       n_readers != 0
rw_unlock_write:
   mutex_lock
   n_readers = 0
   mutex_unlock
   cond_broadcast
                       cond_wait

thread 2 is waiting here when it should be able to grab the rw_lock. 
That is wrong and would not happen with BROKEN=0.

But it seems like thread 1 should be able to continue running. (And 
eventually, one of its broadcasts would catch another thread at the 
right time.) I don't know how this defect in rw_unlock_write could 
cause the program you've shown to stop entirely.

Maybe a legitimate expert will show the full picture... or maybe it is 
indeed a bug in NPTL.

Scott