[Libguestfs] [libnbd PATCH v2] lib/errors.c: Fix assert fail in exit path in multi-threaded code

Wed Mar 8 23:12:42 UTC 2023

On Wed, Mar 08, 2023 at 10:55:01PM +0000, Richard W.M. Jones wrote:
> > @@ -87,16 +92,23 @@ free_errors_key (void *vp)
> >  static struct last_error *
> >  allocate_last_error_on_demand (void)
> >  {
> > -  struct last_error *last_error = pthread_getspecific (errors_key);
> > +  struct last_error *last_error = NULL;
> > 
> > -  if (!last_error) {
> > -    last_error = calloc (1, sizeof *last_error);
> > -    if (last_error) {
> > -      int err = pthread_setspecific (errors_key, last_error);
> > -      if (err != 0) {
> > -        /* This is not supposed to happen (XXX). */
> > -        fprintf (stderr, "%s: %s: %s\n", "libnbd", "pthread_setspecific",
> > -                 strerror (err));
> > +  if (key_use_count) {
> > +    last_error = pthread_getspecific (errors_key);
> > +    if (!last_error) {
> > +      last_error = calloc (1, sizeof *last_error);
> > +      if (last_error) {
> > +        int err = pthread_setspecific (errors_key, last_error);
> > +        key_use_count++;
> > +        if (err != 0) {
> > +          /* This is not supposed to happen (XXX). */
> > +          fprintf (stderr, "%s: %s: %s\n", "libnbd", "pthread_setspecific",
> > +                   strerror (err));
> > +          free (last_error);
> > +          last_error = NULL;
> > +          key_use_count--;
> > +        }
> >        }
> >      }
> >    }
> 
> I suspect this may not be completely race condition free unless
> there's a lock.  Otherwise key_use_count could be > 0 when we enter
> this code and then another thread could run the destructor at the same
> time.

Hmm.  I agree that the common path (pthread_get/setspecific) must NOT
lock.  But the !last_error path is not the common case (only once per
thread), likewise the data destructor is only once per thread.

There's also the question of how much work it is to make this
bulletproof, and yet how difficult it is to test that it is actually
correct.  Going with -Z nodelete is certainly less maintenance.

> 
> However I don't want to add locking around these functions since
> (especially the one setting context) is highly contended and this
> could kill performance.
> 
> I think the worst that might happen is we would get EINVAL from
> pthread_setspecific, print a message, but at least we won't crash
> because the assert has been removed.
> 
> So soft ACK, but I'm still wondering if there's a better way to do this.

I'll be thinking about locking only the slow paths, while leaving the
fast paths unlocked.

> 
> If simply never calling pthread_key_destroy a good idea?  The worst is
> it leaks slots in glibc's thread-local storage array.

That thinking is precisely why libvirt uses -Z nodelete - the moment a
thread can have thread-local data with a destructor pointing to libnbd
code, but where libnbd can be removed from memory without calling
pthread_key_delete(), is the moment you get a SEGV trying to call the
thread destructor.  A memory leak is better than a SEGV, so we either
have to enable '-Z nodelete' or unconditionally use
pthread_key_delete() when libnbd is unloaded.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org