[Libguestfs] [libnbd PATCH v2 5/5] copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails

Nir Soffer nsoffer at redhat.com
Fri Feb 4 00:08:21 UTC 2022


On Thu, Feb 3, 2022 at 10:26 PM Eric Blake <eblake at redhat.com> wrote:
>
> nbdcopy has a nasty bug when performing multi-threaded copies using
> asynchronous nbd calls - it was blindly treating the completion of an
> asynchronous command as successful, rather than checking the *error
> parameter.  This can result in the silent creation of a corrupted
> image in two different ways: when a read fails, we blindly wrote
> garbage to the destination; when a write fails, we did not flag that
> the destination was not written.
>
> Since nbdcopy already calls exit() on a synchronous read or write
> failure to a file, doing the same for an asynchronous op to an NBD
> server is the simplest solution.  A nicer solution, but more invasive
> to code and thus not done here, might be to allow up to N retries of
> the transaction (in case the read or write failure was transient), or
> even having a mode where as much data is copied as possible (portions
> of the copy that failed would be logged on stderr, and nbdcopy would
> still fail with a non-zero exit status, but this would copy more than
> just stopping at the first error, as can be done with rsync or
> ddrescue).

Maybe this belongs to the nbdresuce tool :-)

> Note that since we rely on auto-retiring and do NOT call
> nbd_aio_command_completed, our completion callbacks must always return
> 1 (if they do not exit() first), even when acting on *error, so as not
> leave the command allocated until nbd_close.  As such, there is no
> sane way to return an error to a manual caller of the callback, and
> therefore we can drop dead code that calls perror() and exit() if the
> callback "failed".  It is also worth documenting the contract on when
> we must manually call the callback during the asynch_zero callback, so
> that we do not leak or double-free the command; thankfully, all the
> existing code paths were correct.
>
> The added testsuite script demonstrates several scenarios, some of
> which fail without the rest of this patch in place, and others which
> showcase ways in which sparse images can bypass errors.
>
> Once backports are complete, a followup patch on the main branch will
> edit docs/libnbd-security.pod with the mailing list announcement of
> the stable branch commit ids and release versions that incorporate
> this fix.
>
> Reported-by: Nir Soffer <nsoffer at redhat.com>
> Fixes: bc896eec4d ("copy: Implement multi-conn, multiple threads, multiple requests in flight.", v1.5.6)
> Fixes: https://bugzilla.redhat.com/2046194
> ---
>  TODO                        |  1 +
>  copy/Makefile.am            |  4 +-
>  copy/copy-nbd-error.sh      | 78 +++++++++++++++++++++++++++++++++++++
>  copy/file-ops.c             | 17 +++-----
>  copy/multi-thread-copying.c | 15 +++++++
>  copy/nbdcopy.h              |  7 ++--
>  copy/null-ops.c             | 10 +----
>  7 files changed, 108 insertions(+), 24 deletions(-)
>  create mode 100755 copy/copy-nbd-error.sh
>
> diff --git a/TODO b/TODO
> index da157942..7c9c15e2 100644
> --- a/TODO
> +++ b/TODO
> @@ -33,6 +33,7 @@ nbdcopy:
>   - Better page cache usage, see nbdkit-file-plugin options
>     fadvise=sequential cache=none.
>   - Consider io_uring if there are performance bottlenecks.
> + - Configurable retries in response to read or write failures.
>
>  nbdfuse:
>   - If you write beyond the end of the virtual file, it returns EIO.
> diff --git a/copy/Makefile.am b/copy/Makefile.am
> index f2100853..e729f86a 100644
> --- a/copy/Makefile.am
> +++ b/copy/Makefile.am
> @@ -1,5 +1,5 @@
>  # nbd client library in userspace
> -# Copyright (C) 2020 Red Hat Inc.
> +# Copyright (C) 2020-2022 Red Hat Inc.
>  #
>  # This library is free software; you can redistribute it and/or
>  # modify it under the terms of the GNU Lesser General Public
> @@ -33,6 +33,7 @@ EXTRA_DIST = \
>         copy-nbd-to-small-nbd-error.sh \
>         copy-nbd-to-sparse-file.sh \
>         copy-nbd-to-stdout.sh \
> +       copy-nbd-error.sh \
>         copy-progress-bar.sh \
>         copy-sparse.sh \
>         copy-sparse-allocated.sh \
> @@ -124,6 +125,7 @@ TESTS += \
>         copy-stdin-to-nbd.sh \
>         copy-stdin-to-null.sh \
>         copy-nbd-to-stdout.sh \
> +       copy-nbd-error.sh \
>         copy-progress-bar.sh \
>         copy-sparse.sh \
>         copy-sparse-allocated.sh \
> diff --git a/copy/copy-nbd-error.sh b/copy/copy-nbd-error.sh
> new file mode 100755
> index 00000000..89f0a2f1
> --- /dev/null
> +++ b/copy/copy-nbd-error.sh
> @@ -0,0 +1,78 @@
> +#!/usr/bin/env bash
> +# nbd client library in userspace
> +# Copyright (C) 2022 Red Hat Inc.
> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2 of the License, or (at your option) any later version.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library; if not, write to the Free Software
> +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> +
> +# Tests several scenarios of handling NBD server errors
> +# Serves as a regression test for the CVE-2022-0485 fix.
> +
> +. ../tests/functions.sh
> +
> +set -e
> +set -x
> +
> +requires nbdkit --exit-with-parent --version
> +
> +fail=0
> +
> +# Failure to get block status should not be fatal, but merely downgrade to
> +# reading the entire image as if data
> +echo "Testing extents failures on source"
> +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
> +    error-extents-rate=1 ] null: || fail=1
> +
> +# Failure to read should be fatal
> +echo "Testing read failures on non-sparse source"
> +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
> +    error-pread-rate=0.5 ] null: && fail=1
> +
> +# However, reliable block status on a sparse image can avoid the need to read
> +echo "Testing read failures on sparse source"
> +$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \
> +    error-pread-rate=1 ] null: || fail=1
> +
> +# Failure to write data should be fatal
> +echo "Testing write data failures on arbitrary destination"
> +$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \
> +    [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \
> +        memory 5M error-pwrite-rate=0.5 ] && fail=1
> +
> +# However, writing zeroes can bypass the need for normal writes
> +echo "Testing write data failures from sparse source"
> +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \
> +    [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \
> +        memory 5M error-pwrite-rate=1 ] || fail=1
> +
> +# Failure to write zeroes should be fatal
> +echo "Testing write zero failures on arbitrary destination"
> +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \
> +    [ nbdkit --exit-with-parent -v --filter=error memory 5M \
> +        error-zero-rate=1 ] && fail=1
> +
> +# However, assuming/learning destination is zero can skip need to write
> +echo "Testing write failures on pre-zeroed destination"
> +$VG nbdcopy --destination-is-zero -- \
> +    [ nbdkit --exit-with-parent -v null 5M ] \
> +    [ nbdkit --exit-with-parent -v --filter=error memory 5M \
> +        error-pwrite-rate=1 error-zero-rate=1 ] || fail=1
> +
> +# Likewise, when write zero is not advertised, fallback to normal write works
> +echo "Testing write zeroes to destination without zero support"
> +$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \
> +    [ nbdkit --exit-with-parent -v --filter=nozero --filter=error memory 5M \
> +        error-zero-rate=1 ] || fail=1

Good test!

All this behavior was not tested before?

> +
> +exit $fail
> diff --git a/copy/file-ops.c b/copy/file-ops.c
> index e37a5014..aaf04ade 100644
> --- a/copy/file-ops.c
> +++ b/copy/file-ops.c
> @@ -591,10 +591,8 @@ file_asynch_read (struct rw *rw,
>
>    file_synch_read (rw, slice_ptr (command->slice),
>                     command->slice.len, command->offset);
> -  if (cb.callback (cb.user_data, &dummy) == -1) {
> -    perror (rw->name);
> -    exit (EXIT_FAILURE);
> -  }
> +  /* file_synch_read called exit() on error */
> +  cb.callback (cb.user_data, &dummy);
>  }
>
>  static void
> @@ -606,10 +604,8 @@ file_asynch_write (struct rw *rw,
>
>    file_synch_write (rw, slice_ptr (command->slice),
>                      command->slice.len, command->offset);
> -  if (cb.callback (cb.user_data, &dummy) == -1) {
> -    perror (rw->name);
> -    exit (EXIT_FAILURE);
> -  }
> +  /* file_synch_write called exit() on error */
> +  cb.callback (cb.user_data, &dummy);
>  }
>
>  static bool
> @@ -620,10 +616,7 @@ file_asynch_zero (struct rw *rw, struct command *command,
>
>    if (!file_synch_zero (rw, command->offset, command->slice.len, allocate))
>      return false;
> -  if (cb.callback (cb.user_data, &dummy) == -1) {
> -    perror (rw->name);
> -    exit (EXIT_FAILURE);
> -  }
> +  cb.callback (cb.user_data, &dummy);
>    return true;
>  }
>
> diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
> index 815b8a02..2aa74d63 100644
> --- a/copy/multi-thread-copying.c
> +++ b/copy/multi-thread-copying.c
> @@ -28,6 +28,7 @@
>  #include <errno.h>
>  #include <assert.h>
>  #include <sys/stat.h>
> +#include <inttypes.h>
>
>  #include <pthread.h>
>
> @@ -376,6 +377,13 @@ finished_read (void *vp, int *error)
>  {
>    struct command *command = vp;
>
> +  /* XXX - is it worth retrying a failed command? */
> +  if (*error) {
> +    fprintf (stderr, "read at offset 0x%" PRIx64 "failed: %s\n",

Good error message, but  there is missing space before "failed".

Why use hex offset? I find it very confusing. I don't think nbdcopy users
think in hex.

> +             command->offset, strerror (*error));
> +    exit (EXIT_FAILURE);
> +  }
> +
>    if (allocated || sparse_size == 0) {
>      /* If sparseness detection (see below) is turned off then we write
>       * the whole command.
> @@ -548,6 +556,13 @@ free_command (void *vp, int *error)
>    struct command *command = vp;
>    struct buffer *buffer = command->slice.buffer;
>
> +  /* XXX - is it worth retrying a failed command? */
> +  if (*error) {
> +    fprintf (stderr, "write at offset 0x%" PRIx64 "failed: %s\n",
> +             command->offset, strerror (*error));
> +    exit (EXIT_FAILURE);
> +  }
> +
>    if (buffer != NULL) {
>      if (--buffer->refs == 0) {
>        free (buffer->data);
> diff --git a/copy/nbdcopy.h b/copy/nbdcopy.h
> index e7fe1eab..c070f8d7 100644
> --- a/copy/nbdcopy.h
> +++ b/copy/nbdcopy.h
> @@ -1,5 +1,5 @@
>  /* NBD client library in userspace.
> - * Copyright (C) 2020-2021 Red Hat Inc.
> + * Copyright (C) 2020-2022 Red Hat Inc.
>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public
> @@ -157,7 +157,8 @@ struct rw_ops {
>                        bool allocate);
>
>    /* Asynchronous I/O operations.  These start the operation and call
> -   * 'cb' on completion.
> +   * 'cb' on completion.  'cb' will return 1, for auto-retiring with
> +   * asynchronous libnbd calls.
>     *
>     * The file_ops versions are actually implemented synchronously, but
>     * still call 'cb'.
> @@ -173,7 +174,7 @@ struct rw_ops {
>                          nbd_completion_callback cb);
>
>    /* Asynchronously zero.  command->slice.buffer is not used.  If not possible,
> -   * returns false.
> +   * returns false.  'cb' must be called only if returning true.
>     */
>    bool (*asynch_zero) (struct rw *rw, struct command *command,
>                         nbd_completion_callback cb, bool allocate);
> diff --git a/copy/null-ops.c b/copy/null-ops.c
> index 11cd5116..5f1fda50 100644
> --- a/copy/null-ops.c
> +++ b/copy/null-ops.c
> @@ -119,10 +119,7 @@ null_asynch_write (struct rw *rw,
>  {
>    int dummy = 0;
>
> -  if (cb.callback (cb.user_data, &dummy) == -1) {
> -    perror (rw->name);
> -    exit (EXIT_FAILURE);
> -  }
> +  cb.callback (cb.user_data, &dummy);
>  }
>
>  static bool
> @@ -131,10 +128,7 @@ null_asynch_zero (struct rw *rw, struct command *command,
>  {
>    int dummy = 0;
>
> -  if (cb.callback (cb.user_data, &dummy) == -1) {
> -    perror (rw->name);
> -    exit (EXIT_FAILURE);
> -  }
> +  cb.callback (cb.user_data, &dummy);
>    return true;
>  }
>
> --
> 2.34.1
>

Looks ready after fixing the error message.

Nir




More information about the Libguestfs mailing list