[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[PROBLEM] pthread_cond_signal sometimes doesn't work



Excuse the intrusion, but I'm not sure where else to turn.

I work on a heavily multi-threaded project and we are seeing
a problem that only seems to occur on RH9 (and nptl).

We have some worker threads that do
a: pthread_mutex_lock(&mutex);
   pthread_cond_wait(&cond, &mutex);
   [dequeue work element]
   pthread_mutex_unlock(&mutex);
   [process work request]
   goto a;
and some scheduler threads that do
   pthread_mutex_lock(&mutex);
   [queue work element]
   pthread_cond_signal(&cond);
   pthread_mutex_unlock(&mutex);

Nothing too unusual here, we have a number of mutex/cond pairs
that do the same kind of thing.  However, for one particular
pair, we can see one or more worker threads entering pthread_cond_wait
and then a scheduler thread doing pthread_cond_signal, and nothing
happens.  This is usually after several hundred thousand or even
million iterations, if at all.

The releveant code is a couple of years old and runs on a variety
of hardware and software platforms; as I mentioned, afaik, it only
fails on rh9 (glibc 2.3.2-27.9; nptl is from early April).  I've
been programming too long to not doubt our code.  Also, we've
gotten burned by recent gcc (3.2.2-5) -O3 optimizations.

My question is, how best to continue to debug the problem ??

Thanks,
Greg Smith




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]