[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: problems with pthread_cond_broadcast



On Apr 15, 2004, at 12:50 AM, Thorsten Kukuk wrote:


Hi,


I have a problem with pthread_cond_wait/pthread_cond_broadcast
waiting sometimes forever on a fast SMP machine. Attached is a
simple test case.

If I use the order
  pthread_mutex_unlock (&lock);
  pthread_cond_broadcast (&pcond);

with NPTL, the program will hang after a short time running with
current glibc + NPTL + kernel 2.6.x on all architectures I tested.

If I revert the order to
  pthread_cond_broadcast (&pcond);
  pthread_mutex_unlock (&lock);

it works fine.

Is this a problem of the test case (since pthread_cond_broadcast and
pthread_cond_wait will access pcond at the same time in different
threads) or is this a glibc/NPTL/kernel problem?

I think there is a problem in your test case. And I _sort_of_ see it. Imagine this sequence of events:


Thread 1            Thread 2
----------------------------------------
rw_lock_write:      rw_lock_write:
  mutex_lock
  n_readers = -1
  mutex_unlock
                      mutex_lock
                      n_readers != 0
rw_unlock_write:
  mutex_lock
  n_readers = 0
  mutex_unlock
  cond_broadcast
                      cond_wait

thread 2 is waiting here when it should be able to grab the rw_lock. That is wrong and would not happen with BROKEN=0.

But it seems like thread 1 should be able to continue running. (And eventually, one of its broadcasts would catch another thread at the right time.) I don't know how this defect in rw_unlock_write could cause the program you've shown to stop entirely.

Maybe a legitimate expert will show the full picture... or maybe it is indeed a bug in NPTL.

Scott



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]