[Libguestfs] [PATCH libnbd v4] lib/errors.c: Fix assert fail in exit path in multi-threaded code

Richard W.M. Jones rjones at redhat.com
Thu Mar 9 09:50:00 UTC 2023


When a highly multi-threaded program such as nbdcopy encounters an
error, there is a race condition in the library which can cause an
assertion failure and thus a core dump:

(1) An error occurs on one of the threads.  nbdcopy calls exit(3).

(2) In lib/errors.c, the destructor calls pthread_key_delete.

(3) Another thread which is still running also encounters an error,
and inside libnbd the library calls set_error().

(4) The call to set_error() calls pthread_getspecific which returns
NULL (since the key has already been destroyed in step (2)), and this
causes us to call pthread_setspecific which returns EINVAL because
glibc is able to detect invalid use of a pthread_key_t after it has
been destroyed.  In any case, the error message is lost, and any
subsequent call to nbd_get_error() will return NULL.

(5) We enter the %DEAD state, where there is an assertion that
nbd_get_error() is not NULL.  This assertion is supposed to be for
checking that the code called set_error() before entering the %DEAD
state.  Although we did call set_error(), the error message was lost
at step (4) and nbd_get_error() will always return NULL.  This
assertion failure causes a crash.

There aren't any good ways to fix this, so I chose the same method as
used by libvirt in this commit:

https://gitlab.com/libvirt/libvirt/-/commit/8e44e5593eb9b89fbc0b54fde15f130707a0d81e

(a) Use '-z nodelete' to prevent the library from being unloaded on
dlclose().

(b) Do not call pthread_key_destroy (thus leaking the key).

(c) When threads exit they are still able to call free_errors_key
because it is still present in memory.

Thanks: Daniel P. Berrangé, Eric Blake
---
 configure.ac    | 9 +++++++++
 lib/Makefile.am | 1 +
 lib/errors.c    | 6 +++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index b6d60c3df6..f495926eff 100644
--- a/configure.ac
+++ b/configure.ac
@@ -607,6 +607,14 @@ AS_IF([test "x$enable_golang" != "xno"],[
 ],[GOLANG=no])
 AM_CONDITIONAL([HAVE_GOLANG],[test "x$GOLANG" != "xno"])
 
+AC_MSG_CHECKING([for how to mark DSO non-deletable at runtime])
+NODELETE=
+`$LD --help 2>&1 | grep -- "-z nodelete" >/dev/null` && \
+    NODELETE="-Wl,-z -Wl,nodelete"
+AC_MSG_RESULT([$NODELETE])
+AC_SUBST([NODELETE])
+
+AC_MSG_CHECKING([for how to set DSO symbol versions])
 case $host_os in
   darwin*)
   VERSION_SCRIPT=
@@ -615,6 +623,7 @@ case $host_os in
   VERSION_SCRIPT="-Wl,--version-script=${srcdir}/libnbd.syms"
   ;;
 esac
+AC_MSG_RESULT([$VERSION_SCRIPT])
 AC_SUBST([VERSION_SCRIPT])
 
 dnl Produce output files.
diff --git a/lib/Makefile.am b/lib/Makefile.am
index 52b525819b..c886be7da0 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -78,6 +78,7 @@ libnbd_la_LIBADD = \
 	$(NULL)
 libnbd_la_LDFLAGS = \
 	$(PTHREAD_LIBS) \
+	$(NODELETE) \
 	$(VERSION_SCRIPT) \
 	-version-info 0:0:0 \
 	$(NULL)
diff --git a/lib/errors.c b/lib/errors.c
index 8b77650ef3..6fbfaacd34 100644
--- a/lib/errors.c
+++ b/lib/errors.c
@@ -69,7 +69,11 @@ errors_key_destroy (void)
     free (last_error->error);
     free (last_error);
   }
-  pthread_key_delete (errors_key);
+
+  /* We could do this, but that causes a race condition described here:
+   * https://listman.redhat.com/archives/libguestfs/2023-March/031002.html
+   */
+  //pthread_key_delete (errors_key);
 }
 
 /* This is called when a thread exits, to free the thread-local data
-- 
2.39.2



More information about the Libguestfs mailing list