[Libguestfs] [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code

Richard W.M. Jones rjones at redhat.com
Thu Mar 9 09:50:51 UTC 2023


On Thu, Mar 09, 2023 at 09:13:24AM +0000, Daniel P. Berrangé wrote:
> On Thu, Mar 09, 2023 at 08:44:51AM +0000, Richard W.M. Jones wrote:
> > When a highly multi-threaded program such as nbdcopy encounters an
> > error, there is a race condition in the library which can cause an
> > assertion failure and thus a core dump:
> > 
> > (1) An error occurs on one of the threads.  nbdcopy calls exit(3).
> > 
> > (2) In lib/errors.c, the destructor calls pthread_key_delete.
> > 
> > (3) Another thread which is still running also encounters an error,
> > and inside libnbd the library calls set_error().
> > 
> > (4) The call to set_error() calls pthread_getspecific which returns
> > NULL (since the key has already been destroyed in step (2)), and this
> > causes us to call pthread_setspecific which returns EINVAL because
> > glibc is able to detect invalid use of a pthread_key_t after it has
> > been destroyed.  In any case, the error message is lost, and any
> > subsequent call to nbd_get_error() will return NULL.
> > 
> > (5) We enter the %DEAD state, where there is an assertion that
> > nbd_get_error() is not NULL.  This assertion is supposed to be for
> > checking that the code called set_error() before entering the %DEAD
> > state.  Although we did call set_error(), the error message was lost
> > at step (4) and nbd_get_error() will always return NULL.  This
> > assertion failure causes a crash.
> > 
> > There aren't any good ways to fix this.  I chose to leak the
> > pthread_key_t on the exit path.
> > ---
> >  lib/errors.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/lib/errors.c b/lib/errors.c
> > index 8b77650ef3..6fbfaacd34 100644
> > --- a/lib/errors.c
> > +++ b/lib/errors.c
> > @@ -69,7 +69,11 @@ errors_key_destroy (void)
> >      free (last_error->error);
> >      free (last_error);
> >    }
> > -  pthread_key_delete (errors_key);
> > +
> > +  /* We could do this, but that causes a race condition described here:
> > +   * https://listman.redhat.com/archives/libguestfs/2023-March/031002.html
> > +   */
> > +  //pthread_key_delete (errors_key);
> >  }
> 
> While I think this fixes the problem with pthread_setspecific crashing,
> I believe this then opens apps up to the problem hit by libvirt with
> destructors being run which don't exist in memory.
> 
> Consider a thread is running that has done something with libnbd which
> caused allocate_last_error_on_demand() to run, which populates the
> 'errors_key' data for that thread. This thread is no longer doing anything
> with libnbd, but the 'errors_key' data remains populated none the less
> because that's just how thread locals work.
> 
> Now nothing at all is using libnbd.so, so some thread calls dlclose(),
> causing libnbd.so to be unmapped from memory. Since we've not called
> pthread_key_delete, the thread local still exists and, crucially, so
> does its registered destructor function.
> 
> The thread mentioned in the previous paragraph now terminates and this
> causes the destructor function 'free_errors_key' to be run.... except
> this function isn't mapped in memory any more. This will cause the
> program to crash.

Indeed it does exactly this.  v4 posted.

Rich.

> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v


More information about the Libguestfs mailing list