[Libguestfs] [PATCH libnbd] lib/errors.c: Fix assert fail in exit path in multi-threaded code

Wed Mar 8 21:22:29 UTC 2023

On Wed, Mar 08, 2023 at 08:29:41PM +0000, Richard W.M. Jones wrote:
> When a highly multi-threaded program such as nbdcopy encounters an
> error, there is a race condition in the library which can cause an
> assertion failure and thus a core dump:
> 
> (1) An error occurs on one of the threads.  nbdcopy calls exit(3).
> 
> (2) In lib/errors.c, the destructor calls pthread_key_delete.
>

Note - POSIX says that calling pthread_key_delete() can leak memory.
If the reason we are calling pthread_key_delete() is because libnbd
was dlopen'd and is now being dlclose'd, and an app repeatedly reloads
libnbd, causes an error, and then unloads libnbd, this leak could grow
to be rather sizeable.  But that is a rather unlikely scenario, and
safely coordinating cleanup is hard when we consider both normal
thread exits (where the key destructor runs if the key is not yet
deleted) and module unloads (where pthread_key_delete skips key
destructors).

> (3) Another thread which is still running also encounters an error,
> and inside libnbd the library calls set_error().
> 
> (4) The call to set_error() calls pthread_getspecific which returns
> NULL (since the key has already been destroyed in step (2)), and this

This is undefined behavior per POSIX.  It is not safe to call
pthread_getspecific() on a destroyed key (even though glibc lets us
get away with it by giving us EINVAL).

> causes us to call pthread_setspecific which returns EINVAL because
> glibc is able to detect invalid use of a pthread_key_t after it has
> been destroyed.  In any case, the error message is lost, and any
> subsequent call to nbd_get_error() will return NULL.

Indeed - once we call pthread_key_delete() in step 2, ANY further use
of the key is going to cause undefined behavior (hopefully EINVAL
failures, but worse behavior is also possible).

One possible idea: have a non-thread-local global that starts life
false, but which errors_free() atomically sets to true prior to
calling pthread_key_delete().  All other functions which reference
errors_key check that the global is still false before using
pthread_{get,set}specific.

Another idea: have a counter of the number of threads that have set a
thread-local error but not yet exited.  Increment the counter any time
allocate_last_error_on_demand() says a new thread is triggering an
error, and decrement the counter any time free_errors_key() calls the
destructor at the end of a thread.  Only when all threads with an
allocated error have exited will the counter reach zero, so only the
last thread to clean up needs to call pthread_key_delete().  The
counter might not reach zero before the process exits, but that's okay
(if the reason the module is being unloaded is because the process is
going away, cleanup is less important), although I don't know if
valgrind would complain.  In fact, POSIX suggests:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_key_delete.html
"For an application to know that it is safe to delete a key, it has to
know that all the threads that might potentially ever use the key do
not attempt to use it again. For example, it could know this if all
the client threads have called a cleanup procedure declaring that they
are through with the module that is being shut down, perhaps by
setting a reference count to zero."

This still doesn't protect us from a parallel thread encountering its
first need for error allocation after the point that we previously saw
the count of threads with active errors go to zero.  That said, we
call allocate_last_error_on_demand() fairly reliably (most API calls
do so in order to set the context - it's pretty hard to write a
multi-threaded app using libnbd APIs but the first call to a libnbd
API that would allocate error storage occurs after another thread has
already started the shutdown process).

> 
> (5) We enter the %DEAD state, where there is an assertion that
> nbd_get_error() is not NULL.  This assertion is supposed to be for
> checking that the code called set_error() before entering the %DEAD
> state.  Although we did call set_error(), the error message was lost
> at step (4) and nbd_get_error() will always return NULL.  This
> assertion failure causes a crash.
> 
> There aren't any good ways to fix this.  I chose to remove the
> assertions that might trigger, and ignore pthread_getspecific
> returning EINVAL.

I concur that removing the assertion seems reasonable to paper over
the problem.  But it does not solve the issue that we may still be
leaking memory if the pthread_key_delete() happens before all threads
that have allocated error storage have had a chance to reach their
destructors, nor that even though glibc was nice and gave us EINVAL,
we really are invoking undefined behavior by using the key after it is
deleted.

> 
> This reverts a small part of commit 831cbf7ee14c ("generator: Allow
> DEAD state actions to run").
> ---
>  generator/state_machine_generator.ml | 5 +----
>  generator/states.c                   | 2 --
>  lib/errors.c                         | 5 ++++-
>  3 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/generator/state_machine_generator.ml b/generator/state_machine_generator.ml
> index 43c1ce4c14..9f94bed1f3 100644
> --- a/generator/state_machine_generator.ml
> +++ b/generator/state_machine_generator.ml
> @@ -449,10 +449,7 @@ let
>    pr "      abort (); /* Should never happen, but keeps GCC happy. */\n";
>    pr "    }\n";
>    pr "\n";
> -  pr "    if (r == -1) {\n";
> -  pr "      assert (nbd_get_error () != NULL);\n";
> -  pr "      return -1;\n";
> -  pr "    }\n";
> +  pr "    if (r == -1) return -1;\n";

I might have split this into two lines, but this style for one-liner
ifs is present elsewhere in the tree so it is not wrong.

>    pr "  } while (!blocked);\n";
>    pr "  return 0;\n";
>    pr "}\n";
> diff --git a/generator/states.c b/generator/states.c
> index fa0f8d716e..c0cf5a7f26 100644
> --- a/generator/states.c
> +++ b/generator/states.c
> @@ -191,8 +191,6 @@ STATE_MACHINE {
>    return 0;
>  
>   DEAD:
> -  /* The caller should have used set_error() before reaching here */
> -  assert (nbd_get_error ());

If we add some sort of global telling us whether the key is still
active, we could still have:
assert (nbd_error_key_finalized || nbd_get_error ())

But I'm not sure if it is worth that.  Dropping the assertion is
certainly simpler, even if it papers over other potential problems.

>    abort_option (h);
>    nbd_internal_abort_commands (h, &h->cmds_to_issue);
>    nbd_internal_abort_commands (h, &h->cmds_in_flight);
> diff --git a/lib/errors.c b/lib/errors.c
> index 8b77650ef3..9142a0843e 100644
> --- a/lib/errors.c
> +++ b/lib/errors.c
> @@ -93,7 +93,10 @@ allocate_last_error_on_demand (void)
>      last_error = calloc (1, sizeof *last_error);
>      if (last_error) {
>        int err = pthread_setspecific (errors_key, last_error);
> -      if (err != 0) {
> +      /* Ignore EINVAL because that can happen in a race with other
> +       * threads when we are exiting.
> +       */
> +      if (err != 0 && err != EINVAL) {

This is where I'm thinking we can still do better by having a
reference counter.  I'll try a v2 along the lines of my thoughts
above...

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org