[Libguestfs] [libnbd PATCH v2 0/5] CVE-2022-0485 fix: nbdcopy silent corruption

Thu Feb 3 20:25:53 UTC 2022

Here's the summary of the diff to v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/5:[down] 'python: tests: Fix error handling'
002/5:[down] 'ocaml: tests: Fix error handling'
003/5:[0083] [FC] 'docs: Clarify how callbacks should handle errors'
004/5:[down] 'copy: Pass in dummy variable rather than &errno to callback'
005/5:[down] 'copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails'

except that my version of the git backport-diff script doesn't handle
renamed patches well.

Patch 3 was originally at:
https://listman.redhat.com/archives/libguestfs/2022-February/msg00037.html
Patch 5 was originally at:
https://listman.redhat.com/archives/libguestfs/2022-February/msg00039.html
the rest are indeed new to this posting.  Enough has changed that I
didn't carry forward any positive reviews from Rich, Nir, or Laszlo.
At the bottom of the patch, I'll include the full diff of how v2
improves over v1.

Eric Blake (5):
  python: tests: Fix error handling
  ocaml: tests: Fix error handling
  docs: Clarify how callbacks should handle errors
  copy: Pass in dummy variable rather than &errno to callback
  copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails

 docs/libnbd.pod                  | 69 +++++++++++++++++++++-------
 generator/API.ml                 | 24 +++++++---
 python/t/590-aio-copy.py         |  4 +-
 ocaml/tests/test_590_aio_copy.ml |  8 ++--
 TODO                             |  1 +
 copy/Makefile.am                 |  4 +-
 copy/copy-nbd-error.sh           | 78 ++++++++++++++++++++++++++++++++
 copy/file-ops.c                  | 28 +++++-------
 copy/multi-thread-copying.c      | 23 ++++++++--
 copy/nbdcopy.h                   |  7 +--
 copy/null-ops.c                  | 18 +++-----
 11 files changed, 204 insertions(+), 60 deletions(-)
 create mode 100755 copy/copy-nbd-error.sh

diff --git c/docs/libnbd.pod w/docs/libnbd.pod
index 006d530c..15bdf0a8 100644
--- c/docs/libnbd.pod
+++ w/docs/libnbd.pod
@@ -826,11 +826,15 @@ This can be used to free associated C<user_data>.  For example:
                 (nbd_chunk_callback) { .callback = my_fn,
                                        .user_data = my_data,
                                        .free = free },
-                NBD_NULL_CALLBACK(completion),
+                NBD_NULL_COMPLETION,
                 0);

-will call L<free(3)> on C<my_data> after the last time that the
-S<C<chunk.callback = my_fn>> function is called.
+will call L<free(3)> once on C<my_data> after the point where it is
+known that the S<C<chunk.callback = my_fn>> function can no longer be
+called, regardless of how many times C<my_fn> was actually called.  If
+both a mid-command and completion callback are supplied, the functions
+will be reached in this order: mid-function callbacks, completion
+callback, mid-function free, and finally completion free.

 The free function is only accessible in the C API as it is not needed
 in garbage collected programming languages.
@@ -858,14 +862,16 @@ same nbd object, as it would cause deadlock.

 =head2 Completion callbacks

-All of the low-level commands have a completion callback variant that
-registers a callback function used right before the command is marked
-complete.
+All of the asychronous commands have an optional completion callback
+function that is used right before the command is marked complete,
+after any mid-command callbacks have finished, and before any free
+functions.

 When the completion callback returns C<1>, the command is
 automatically retired (there is no need to call
-L<nbd_aio_command_completed(3)>); for any other return value, the command
-still needs to be retired.
+L<nbd_aio_command_completed(3)>); for any other return value, the
+command still needs to be manually retired (otherwise, the command
+will tie up resources until L<nbd_close(3)> is eventually reached).

 =head2 Callbacks with C<int *error> parameter

@@ -874,37 +880,42 @@ L<nbd_block_status(3)>) involve the use of a callback function invoked
 by the state machine at appropriate points in the server's reply
 before the overall command is complete.  These callback functions,
 along with all of the completion callbacks, include a parameter
-C<error> containing the value of any error detected so far.  If a
-callback function fails and wants to change the resulting error of the
-overall command visible later in the API sequence, it should assign
-back into C<error> and return C<-1>.  Assignments into C<error> are
-ignored for any other return value; similarly, assigning C<0> into
-C<error> does not have an effect.
+C<error> which is a pointer containing the value of any error detected
+so far.  If a callback function fails and wants to change the
+resulting error of the overall command visible later in the API
+sequence, it should assign back into C<error> and return C<-1> in the
+C API.  Assignments into C<error> are ignored for any other return
+value; similarly, assigning C<0> into C<error> does not have an
+effect.  In other language bindings, reporting callback errors is
+generally done by raising an exception rather than by return value.

 Note that a mid-command callback might never be reached, such as if
 libnbd detects that the command was invalid to send (see
-L<nbd_set_strict_mode(3)>) or if the server reports a failure.  It is
-safe for a mid-command callback to ignore non-zero C<error>: all the
-other parameters to the mid-command callback will still be valid
-(corresponding to the current portion of the server's reply), and the
-overall command will still fail (at the completion callback or
-L<nbd_aio_command_completed(3)> for an asynchronous command, or as the
-result of the overall synchronous command).  Returing C<-1> from a
-mid-command callback does not prevent that callback from being reached
-again, if the server sends more mid-command replies that warrant
-another use of that callback.  A mid-command callback may be reached
-more times than expected if the server is non-compliant.
+L<nbd_set_strict_mode(3)>) or if the server reports a failure that
+concludes the command.  It is safe for a mid-command callback to
+ignore non-zero C<error>: all the other parameters to the mid-command
+callback will still be valid (corresponding to the current portion of
+the server's reply), and the overall command will still fail (at the
+completion callback or L<nbd_aio_command_completed(3)> for an
+asynchronous command, or as the result of the overall synchronous
+command).  Returing C<-1> from a mid-command callback does not prevent
+that callback from being reached again, if the server sends more
+mid-command replies that warrant another use of that callback.  A
+mid-command callback may be reached more times than expected if the
+server is non-compliant.

 On the other hand, if a completion callback is supplied (only possible
 with asynchronous commands), it will always be reached exactly once,
-and the completion callback must not ignore the value of C<error>.  In
-particular, the content of a buffer passed to L<nbd_aio_pread(3)> or
-L<nbd_aio_pread_structured(3)> is unspecified if C<error> is set on
-entry to the completion callback.  It is recommended that if your code
-uses automatic command retirement, then the completion function should
-return C<1> on all control paths, even when handling errors (note that
-with automatic retirement, assigning into C<error> is pointless as
-there is no later API to see that value).
+and the completion callback must not ignore the value pointed to by
+C<error>.  In particular, the content of a buffer passed to
+L<nbd_aio_pread(3)> or L<nbd_aio_pread_structured(3)> is undefined
+if C<*error> is non-zero on entry to the completion callback.  It is
+recommended that if you choose to use automatic command retirement
+(where the completion callback returns C<1> to avoid needing to call
+L<nbd_aio_command_completed(3)> later), your completion function
+should return C<1> on all control paths, even when handling errors
+(note that with automatic retirement, assigning into C<error> is
+pointless as there is no later API to see that value).

 =head1 COMPILING YOUR PROGRAM

diff --git c/generator/API.ml w/generator/API.ml
index bdb8e7c8..012016bc 100644
--- c/generator/API.ml
+++ w/generator/API.ml
@@ -1831,7 +1831,7 @@   "pread", {
 The C<flags> parameter must be C<0> for now (it exists for future NBD
 protocol extensions).

-Note that if this command fails, the contents of C<buf> are unspecified."
+Note that if this command fails, the contents of C<buf> are undefined."
 ^ strict_call_description;
     see_also = [Link "aio_pread"; Link "pread_structured";
                 Link "get_block_size"; Link "set_strict_mode"];
@@ -1918,7 +1918,7 @@   "pread_structured", {
 this, see L<nbd_can_df(3)>). Libnbd does not validate that the server
 actually obeys the flag.

-Note that if this command fails, the contents of C<buf> are unspecified."
+Note that if this command fails, the contents of C<buf> are undefined."
 ^ strict_call_description;
     see_also = [Link "can_df"; Link "pread";
                 Link "aio_pread_structured"; Link "get_block_size";
@@ -2459,7 +2459,7 @@   "aio_pread", {
 as described in L<libnbd(3)/Completion callbacks>.

 Note that you must ensure C<buf> is valid until the command has
-completed.  Furthermore, the contents of C<buf> are unspecified if the
+completed.  Furthermore, the contents of C<buf> are undefined if the
 C<error> parameter to C<completion_callback> is set, or if
 L<nbd_aio_command_completed(3)> reports failure.  Other parameters behave
 as documented in L<nbd_pread(3)>."
@@ -2487,7 +2487,7 @@   "aio_pread_structured", {
 as described in L<libnbd(3)/Completion callbacks>.

 Note that you must ensure C<buf> is valid until the command has
-completed.  Furthermore, the contents of C<buf> are unspecified if the
+completed.  Furthermore, the contents of C<buf> are undefined if the
 C<error> parameter to C<completion_callback> is set, or if
 L<nbd_aio_command_completed(3)> reports failure.  Other parameters behave
 as documented in L<nbd_pread_structured(3)>."
diff --git c/python/t/590-aio-copy.py w/python/t/590-aio-copy.py
index 6cde5934..861fa6c8 100644
--- c/python/t/590-aio-copy.py
+++ w/python/t/590-aio-copy.py
@@ -1,5 +1,5 @@
 # libnbd Python bindings
-# Copyright (C) 2010-2019 Red Hat Inc.
+# Copyright (C) 2010-2022 Red Hat Inc.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -36,6 +36,7 @@ def asynch_copy(src, dst):
     # This callback is called when any pread from the source
     # has completed.
     def read_completed(buf, offset, error):
+        assert not error
         global bytes_read
         bytes_read += buf.size()
         wr = (buf, offset)
@@ -46,6 +47,7 @@ def asynch_copy(src, dst):
     # This callback is called when any pwrite to the destination
     # has completed.
     def write_completed(buf, error):
+        assert not error
         global bytes_written
         bytes_written += buf.size()
         # By returning 1 here we auto-retire the pwrite command.
diff --git c/ocaml/tests/test_590_aio_copy.ml w/ocaml/tests/test_590_aio_copy.ml
index 11d89256..a0339c8b 100644
--- c/ocaml/tests/test_590_aio_copy.ml
+++ w/ocaml/tests/test_590_aio_copy.ml
@@ -1,6 +1,6 @@
 (* hey emacs, this is OCaml code: -*- tuareg -*- *)
 (* libnbd OCaml test case
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
  *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -45,7 +45,8 @@ let
    * next iteration of the loop.
    *)
   let writes = ref [] in
-  let read_completed buf offset _ =
+  let read_completed buf offset err =
+    assert (!err = 0);
     bytes_read := !bytes_read + NBD.Buffer.size buf;
     (* Get ready to issue a write command. *)
     writes := (buf, offset) :: !writes;
@@ -56,7 +57,8 @@ let
   (* This callback is called when any pwrite to the destination
    * has completed.
    *)
-  let write_completed buf _ =
+  let write_completed buf err =
+    assert (!err = 0);
     bytes_written := !bytes_written + NBD.Buffer.size buf;
     (* By returning 1 here we auto-retire the pwrite command. *)
     1
diff --git c/copy/Makefile.am w/copy/Makefile.am
index 85989798..e729f86a 100644
--- c/copy/Makefile.am
+++ w/copy/Makefile.am
@@ -33,7 +33,7 @@ EXTRA_DIST = \
 	copy-nbd-to-small-nbd-error.sh \
 	copy-nbd-to-sparse-file.sh \
 	copy-nbd-to-stdout.sh \
-  copy-nbd-error.sh \
+	copy-nbd-error.sh \
 	copy-progress-bar.sh \
 	copy-sparse.sh \
 	copy-sparse-allocated.sh \
@@ -125,7 +125,7 @@ TESTS += \
 	copy-stdin-to-nbd.sh \
 	copy-stdin-to-null.sh \
 	copy-nbd-to-stdout.sh \
-  copy-nbd-error.sh \
+	copy-nbd-error.sh \
 	copy-progress-bar.sh \
 	copy-sparse.sh \
 	copy-sparse-allocated.sh \
diff --git c/copy/copy-nbd-error.sh w/copy/copy-nbd-error.sh
index 50723195..89f0a2f1 100755
--- c/copy/copy-nbd-error.sh
+++ w/copy/copy-nbd-error.sh
@@ -16,6 +16,9 @@
 # License along with this library; if not, write to the Free Software
 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

+# Tests several scenarios of handling NBD server errors
+# Serves as a regression test for the CVE-2022-0485 fix.
+
 . ../tests/functions.sh

 set -e
@@ -23,32 +26,53 @@ set -x

 requires nbdkit --exit-with-parent --version

-pidfile=copy-nbd-error.pid
-sock=$(mktemp -u /tmp/libnbd-test-copy.XXXXXX)
-cleanup_fn rm -f $pidfile $sock
-
-# Run an nbdkit server that randomly fails.
-nbdkit --exit-with-parent -f -v -P $pidfile -U $sock \
-       --filter=error --filter=noextents \
-       memory size=5M error-pread-rate=0.5 error-pwrite-rate=0.5 &
-# Wait for the pidfile to appear.
-for i in {1..60}; do
-    if test -f $pidfile; then
-        break
-    fi
-    sleep 1
-done
-if ! test -f $pidfile; then
-    echo "$0: nbdkit did not start up"
-    exit 1
-fi
-
 fail=0

+# Failure to get block status should not be fatal, but merely downgrade to
+# reading the entire image as if data
+echo "Testing extents failures on source"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
+    error-extents-rate=1 ] null: || fail=1
+
 # Failure to read should be fatal
-$VG nbdcopy -- "nbd+unix:///?socket=$sock" null: && fail=1
+echo "Testing read failures on non-sparse source"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
+    error-pread-rate=0.5 ] null: && fail=1

-# Failure to write should be fatal
-$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] "nbd+unix:///?socket=$sock" && fail=1
+# However, reliable block status on a sparse image can avoid the need to read
+echo "Testing read failures on sparse source"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \
+    error-pread-rate=1 ] null: || fail=1
+
+# Failure to write data should be fatal
+echo "Testing write data failures on arbitrary destination"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \
+    [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \
+        memory 5M error-pwrite-rate=0.5 ] && fail=1
+
+# However, writing zeroes can bypass the need for normal writes
+echo "Testing write data failures from sparse source"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \
+    [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \
+        memory 5M error-pwrite-rate=1 ] || fail=1
+
+# Failure to write zeroes should be fatal
+echo "Testing write zero failures on arbitrary destination"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \
+    [ nbdkit --exit-with-parent -v --filter=error memory 5M \
+        error-zero-rate=1 ] && fail=1
+
+# However, assuming/learning destination is zero can skip need to write
+echo "Testing write failures on pre-zeroed destination"
+$VG nbdcopy --destination-is-zero -- \
+    [ nbdkit --exit-with-parent -v null 5M ] \
+    [ nbdkit --exit-with-parent -v --filter=error memory 5M \
+        error-pwrite-rate=1 error-zero-rate=1 ] || fail=1
+
+# Likewise, when write zero is not advertised, fallback to normal write works
+echo "Testing write zeroes to destination without zero support"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \
+    [ nbdkit --exit-with-parent -v --filter=nozero --filter=error memory 5M \
+        error-zero-rate=1 ] || fail=1

 exit $fail
diff --git c/copy/file-ops.c w/copy/file-ops.c
index eb30f924..aaf04ade 100644
--- c/copy/file-ops.c
+++ w/copy/file-ops.c
@@ -587,10 +587,12 @@ file_asynch_read (struct rw *rw,
                   struct command *command,
                   nbd_completion_callback cb)
 {
+  int dummy = 0;
+
   file_synch_read (rw, slice_ptr (command->slice),
                    command->slice.len, command->offset);
-  errno = 0; /* safe, since file_synch_read exits on error */
-  cb.callback (cb.user_data, &errno);
+  /* file_synch_read called exit() on error */
+  cb.callback (cb.user_data, &dummy);
 }

 static void
@@ -598,20 +600,23 @@ file_asynch_write (struct rw *rw,
                    struct command *command,
                    nbd_completion_callback cb)
 {
+  int dummy = 0;
+
   file_synch_write (rw, slice_ptr (command->slice),
                     command->slice.len, command->offset);
-  errno = 0; /* safe, since file_synch_write exits on error */
-  cb.callback (cb.user_data, &errno);
+  /* file_synch_write called exit() on error */
+  cb.callback (cb.user_data, &dummy);
 }

 static bool
 file_asynch_zero (struct rw *rw, struct command *command,
                   nbd_completion_callback cb, bool allocate)
 {
+  int dummy = 0;
+
   if (!file_synch_zero (rw, command->offset, command->slice.len, allocate))
     return false;
-  errno = 0;
-  cb.callback (cb.user_data, &errno);
+  cb.callback (cb.user_data, &dummy);
   return true;
 }

diff --git c/copy/multi-thread-copying.c w/copy/multi-thread-copying.c
index c578c4bc..2aa74d63 100644
--- c/copy/multi-thread-copying.c
+++ w/copy/multi-thread-copying.c
@@ -28,6 +28,7 @@
 #include <errno.h>
 #include <assert.h>
 #include <sys/stat.h>
+#include <inttypes.h>

 #include <pthread.h>

@@ -378,8 +379,8 @@ finished_read (void *vp, int *error)

   /* XXX - is it worth retrying a failed command? */
   if (*error) {
-    errno = *error;
-    perror("read");
+    fprintf (stderr, "read at offset 0x%" PRIx64 "failed: %s\n",
+             command->offset, strerror (*error));
     exit (EXIT_FAILURE);
   }

@@ -400,6 +401,7 @@ finished_read (void *vp, int *error)
     bool last_is_hole = false;
     uint64_t i;
     struct command *newcommand;
+    int dummy = 0;

     /* Iterate over whole blocks in the command, starting on a block
      * boundary.
@@ -482,8 +484,7 @@ finished_read (void *vp, int *error)
     /* Free the original command since it has been split into
      * subcommands and the original is no longer needed.
      */
-    errno = 0;
-    free_command (command, &errno);
+    free_command (command, &dummy);
   }

   return 1; /* auto-retires the command */
@@ -510,6 +511,7 @@ fill_dst_range_with_zeroes (struct command *command)
 {
   char *data;
   size_t data_size;
+  int dummy = 0;

   if (destination_is_zero)
     goto free_and_return;
@@ -545,8 +547,7 @@ fill_dst_range_with_zeroes (struct command *command)
   free (data);

  free_and_return:
-  errno = 0;
-  free_command (command, &errno);
+  free_command (command, &dummy);
 }

 static int
@@ -557,8 +558,8 @@ free_command (void *vp, int *error)

   /* XXX - is it worth retrying a failed command? */
   if (*error) {
-    errno = *error;
-    perror("write");
+    fprintf (stderr, "write at offset 0x%" PRIx64 "failed: %s\n",
+             command->offset, strerror (*error));
     exit (EXIT_FAILURE);
   }

diff --git c/copy/null-ops.c w/copy/null-ops.c
index 924eaf1e..5f1fda50 100644
--- c/copy/null-ops.c
+++ w/copy/null-ops.c
@@ -117,16 +117,18 @@ null_asynch_write (struct rw *rw,
                    struct command *command,
                    nbd_completion_callback cb)
 {
-  errno = 0;
-  cb.callback (cb.user_data, &errno);
+  int dummy = 0;
+
+  cb.callback (cb.user_data, &dummy);
 }

 static bool
 null_asynch_zero (struct rw *rw, struct command *command,
                   nbd_completion_callback cb, bool allocate)
 {
-  errno = 0;
-  cb.callback (cb.user_data, &errno);
+  int dummy = 0;
+
+  cb.callback (cb.user_data, &dummy);
   return true;
 }