[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Barrier reinit? (prev: Thread starvation with mutex)



I have investigated a little more and found out my problem.
Whenever you have at least two threads waiting for a barrier, if the
first to get it does pthread_barrier_destroy and then
pthread_barrier_init, the second will never get out of
pthread_barrier_wait (because ibarrier->curr_event is back to 0)

I am not sure this is what POSIX allows, because I don't know how to
change the number of threads waiting for a barrier without destroy/init?

Please let me know if I am totally wrong, or if this comment should
better be posted to libc-alpha...?

I don't know how, but there could maybe be an additionnal information
saying "it is the first init ever" or "this was previously destroyed"
and then not reinit curr_event but just increment it..?

Regards,

Sebastien Decugis.


Le mar 10/02/2004 à 18:00, Sebastien Decugis a écrit :
> The topic might be wrong as I am dealing with a barrier, but I think it
> is quite related...
> 
> Please find attached the file 'maxcrea.c' 
> 
> In this sample, I am trying to create as many threads as possible and
> then remove them "cleanly" as soon as create fails. I am using two
> barriers to synchronize every threads.
> 
> When I am running with the symbol WITH_MAIN_SLEEPS defined, everything
> goes well, but as soon as I remove it, it hangs on my box (p3 700MHz;
> redhat 9 + kernel 2.6.2; with standard redhat glibc and with CVS glibc
> (+ nptl) i have the same result). In fact it looks like when the main
> threads passes the barrier then destroys it and then reinit it with a
> different value, before the other threads are woken, those are never
> woken ? As I don't really understand source code from the NPTL yet, I
> don't see where is the problem, but still I think this is a problem, am
> I wrong?
> 
> Thank you!
> 
> Sebastien Decugis.
> 
> 
> Le mar 10/02/2004 à 11:50, Jamie Lokier a écrit :
> > Perez-Gonzalez, Inaky wrote:
> > > > > 1. Thread A calls FUTEX_WAKE
> > > > > 2. Thread A receives 0 from FUTEX_WAKE
> > > > > 3. Thread A atomically unlocks the user space word
> > > > >
> > > > > Now, if some Thread B comes in between 2 and 3 and tries to
> > > > > lock, it will see the user space word locked and go down to
> > > > > wait in the kernel. It will sit there for ever because
> > > > > in (3) the word is locked and nobody knows B is there sleeping.
> > > > 
> > > > After step 2, Thread B sees the user space word is locked and does an
> > > > atomic decrement (or whatever) to indicate that there is a waiter, as
> > > > usual.
> > > > 
> > > > In step 3, Thread A tries to unlock by doing an atomic copmare and
> > > > exchang, and then it sees that the word indicates there is a waiter,
> > > > so loops back to 1.
> > > 
> > > Ouch, I had forgotten we had the counter, not just a cmp/exchange.
> > > Thanks for the correction.
> > > 
> > > But I still think the deadlock apples (in slightly more twisted fashion): 
> > > B, in between 2 and 3, has decremented and gone down to the kernel. 
> > > Before it is able to grab the futex spinlock, A is running, sees there is
> > > a waiter according to the userspace word, so goes back to one. FUTEX_WAKE
> > > returns 0 (because B still hasn't had the chance to queue up), so it 
> > > unlocks. Now B reaches the futex and sleeps for ever.
> > > 
> > > Did I miss anything in this scenario?
> > 
> > Thread A cannot unlock as long as the userspace word indicates "there
> > are waiters".  Therefore it will keep looping calling FUTEX_WAKE It's
> > a near-livelock not a deadlock, and will resolve eventually but may
> > take a long time.
> > 
> > The correct version (having just looked at Rusty's code :) is:
> > 
> >    1. Thread A tries cmp/exchange to unlock the word;
> >       it fails because there are waiters
> >    2. Thread A calls FUTEX_WAKE to pass ownership
> >    3. Thread A receives 0 from FUTEX_WAKE
> >    4. Thread A atomically increments the word; still finds waiters
> >    5. Thread A calls FUTEX_WAKE to signal change
> >          -> Thread B will either be woken or FUTEX_WAIT returns -EAGAIN.
> > 
> > Steps 1-3 are the ownership passing "fair" wakup.  Steps 4-5 are the
> > fallback "unfair" wakeup, that only occurs during the race window.
> > 
> > This provides a high likelihood of ownership passing, but does not
> > guarantee ownership will be passed.
> >      
> > -- Jamie
-- 
Sébastien DECUGIS
Bull S.A.
Tel: 04 76 29 74 93




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]