[Libguestfs] [PATCH libnbd] lib/errors.c: Fix assert fail in exit path in multi-threaded code

Richard W.M. Jones rjones at redhat.com
Wed Mar 8 20:29:41 UTC 2023


When a highly multi-threaded program such as nbdcopy encounters an
error, there is a race condition in the library which can cause an
assertion failure and thus a core dump:

(1) An error occurs on one of the threads.  nbdcopy calls exit(3).

(2) In lib/errors.c, the destructor calls pthread_key_delete.

(3) Another thread which is still running also encounters an error,
and inside libnbd the library calls set_error().

(4) The call to set_error() calls pthread_getspecific which returns
NULL (since the key has already been destroyed in step (2)), and this
causes us to call pthread_setspecific which returns EINVAL because
glibc is able to detect invalid use of a pthread_key_t after it has
been destroyed.  In any case, the error message is lost, and any
subsequent call to nbd_get_error() will return NULL.

(5) We enter the %DEAD state, where there is an assertion that
nbd_get_error() is not NULL.  This assertion is supposed to be for
checking that the code called set_error() before entering the %DEAD
state.  Although we did call set_error(), the error message was lost
at step (4) and nbd_get_error() will always return NULL.  This
assertion failure causes a crash.

There aren't any good ways to fix this.  I chose to remove the
assertions that might trigger, and ignore pthread_getspecific
returning EINVAL.

This reverts a small part of commit 831cbf7ee14c ("generator: Allow
DEAD state actions to run").
---
 generator/state_machine_generator.ml | 5 +----
 generator/states.c                   | 2 --
 lib/errors.c                         | 5 ++++-
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/generator/state_machine_generator.ml b/generator/state_machine_generator.ml
index 43c1ce4c14..9f94bed1f3 100644
--- a/generator/state_machine_generator.ml
+++ b/generator/state_machine_generator.ml
@@ -449,10 +449,7 @@ let
   pr "      abort (); /* Should never happen, but keeps GCC happy. */\n";
   pr "    }\n";
   pr "\n";
-  pr "    if (r == -1) {\n";
-  pr "      assert (nbd_get_error () != NULL);\n";
-  pr "      return -1;\n";
-  pr "    }\n";
+  pr "    if (r == -1) return -1;\n";
   pr "  } while (!blocked);\n";
   pr "  return 0;\n";
   pr "}\n";
diff --git a/generator/states.c b/generator/states.c
index fa0f8d716e..c0cf5a7f26 100644
--- a/generator/states.c
+++ b/generator/states.c
@@ -191,8 +191,6 @@ STATE_MACHINE {
   return 0;
 
  DEAD:
-  /* The caller should have used set_error() before reaching here */
-  assert (nbd_get_error ());
   abort_option (h);
   nbd_internal_abort_commands (h, &h->cmds_to_issue);
   nbd_internal_abort_commands (h, &h->cmds_in_flight);
diff --git a/lib/errors.c b/lib/errors.c
index 8b77650ef3..9142a0843e 100644
--- a/lib/errors.c
+++ b/lib/errors.c
@@ -93,7 +93,10 @@ allocate_last_error_on_demand (void)
     last_error = calloc (1, sizeof *last_error);
     if (last_error) {
       int err = pthread_setspecific (errors_key, last_error);
-      if (err != 0) {
+      /* Ignore EINVAL because that can happen in a race with other
+       * threads when we are exiting.
+       */
+      if (err != 0 && err != EINVAL) {
         /* This is not supposed to happen (XXX). */
         fprintf (stderr, "%s: %s: %s\n", "libnbd", "pthread_setspecific",
                  strerror (err));
-- 
2.39.2



More information about the Libguestfs mailing list