[Libguestfs] [PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
Richard W.M. Jones
rjones at redhat.com
Thu Mar 9 09:50:51 UTC 2023
On Thu, Mar 09, 2023 at 09:13:24AM +0000, Daniel P. Berrangé wrote:
> On Thu, Mar 09, 2023 at 08:44:51AM +0000, Richard W.M. Jones wrote:
> > When a highly multi-threaded program such as nbdcopy encounters an
> > error, there is a race condition in the library which can cause an
> > assertion failure and thus a core dump:
> >
> > (1) An error occurs on one of the threads. nbdcopy calls exit(3).
> >
> > (2) In lib/errors.c, the destructor calls pthread_key_delete.
> >
> > (3) Another thread which is still running also encounters an error,
> > and inside libnbd the library calls set_error().
> >
> > (4) The call to set_error() calls pthread_getspecific which returns
> > NULL (since the key has already been destroyed in step (2)), and this
> > causes us to call pthread_setspecific which returns EINVAL because
> > glibc is able to detect invalid use of a pthread_key_t after it has
> > been destroyed. In any case, the error message is lost, and any
> > subsequent call to nbd_get_error() will return NULL.
> >
> > (5) We enter the %DEAD state, where there is an assertion that
> > nbd_get_error() is not NULL. This assertion is supposed to be for
> > checking that the code called set_error() before entering the %DEAD
> > state. Although we did call set_error(), the error message was lost
> > at step (4) and nbd_get_error() will always return NULL. This
> > assertion failure causes a crash.
> >
> > There aren't any good ways to fix this. I chose to leak the
> > pthread_key_t on the exit path.
> > ---
> > lib/errors.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/errors.c b/lib/errors.c
> > index 8b77650ef3..6fbfaacd34 100644
> > --- a/lib/errors.c
> > +++ b/lib/errors.c
> > @@ -69,7 +69,11 @@ errors_key_destroy (void)
> > free (last_error->error);
> > free (last_error);
> > }
> > - pthread_key_delete (errors_key);
> > +
> > + /* We could do this, but that causes a race condition described here:
> > + * https://listman.redhat.com/archives/libguestfs/2023-March/031002.html
> > + */
> > + //pthread_key_delete (errors_key);
> > }
>
> While I think this fixes the problem with pthread_setspecific crashing,
> I believe this then opens apps up to the problem hit by libvirt with
> destructors being run which don't exist in memory.
>
> Consider a thread is running that has done something with libnbd which
> caused allocate_last_error_on_demand() to run, which populates the
> 'errors_key' data for that thread. This thread is no longer doing anything
> with libnbd, but the 'errors_key' data remains populated none the less
> because that's just how thread locals work.
>
> Now nothing at all is using libnbd.so, so some thread calls dlclose(),
> causing libnbd.so to be unmapped from memory. Since we've not called
> pthread_key_delete, the thread local still exists and, crucially, so
> does its registered destructor function.
>
> The thread mentioned in the previous paragraph now terminates and this
> causes the destructor function 'free_errors_key' to be run.... except
> this function isn't mapped in memory any more. This will cause the
> program to crash.
Indeed it does exactly this. v4 posted.
Rich.
> With regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
More information about the Libguestfs
mailing list