[Libguestfs] [PATCH nbdcopy] copy: Implement extent metadata for efficient copying.

Mon Feb 22 21:19:18 UTC 2021

On 11/24/20 7:04 AM, Richard W.M. Jones wrote:
> This implements these interconnected options:
> 
>   --allocated
>   --destination-is-zero (alias: --target-is-zero)
>   --no-extents
> ---
>  TODO                           |   6 +-

Late review inspired by Nir's recent work, oh well.

> +++ b/copy/copy-sparse-allocated.sh
> @@ -0,0 +1,92 @@

> +
> +requires nbdkit --version
> +requires nbdkit --exit-with-parent --version

Do we really need the first line, since the second covers the same?

> +++ b/copy/copy-sparse-no-extents.sh

> +requires nbdkit --version
> +requires nbdkit --exit-with-parent --version

Multiple instances.

> +++ b/copy/copy-sparse.sh
> @@ -0,0 +1,97 @@

> +# Check the output matches expected.
> +if [ "$(cat $out)" != "pwrite 1 4294967296
> +pwrite 32768 0
> +pwrite 32768 1073709056
> +pwrite 32768 4294934528
> +trim 134184960 32768

Sorry I didn't notice this earlier, but Nir is on the correct track, and
updates this test to expect calls to zero instead (as we cannot
guarantee that read-after-trim reads zero, at least not without a
further NBD spec extension).

> +++ b/copy/file-ops.c
> @@ -24,7 +24,16 @@

> +
> +/* This is done synchronously, but that's fine because commands from
> + * the previous work range in flight continue to run, it's difficult
> + * to (sanely) start new work until we have the full list of extents,
> + * and in almost every case the remote NBD server can answer our
> + * request for extents in a single round trip.
> + */
> +static void
> +nbd_get_extents (struct rw *rw, uintptr_t index,
> +                  uint64_t offset, uint64_t count,
> +                  extent_list *ret)
> +{
> +  extent_list exts = empty_vector;

> +
> +    /* Copy the extents returned into the final list (ret).  This is
> +     * complicated because the extents returned by the server may
> +     * begin earlier and begin or end later than the requested size.
> +     */

Can the extents returned by the server actually begin earlier?  They can
definitely end earlier or later than the reequested size, but it seems
like the beginning is always fixed.  (nbdkit allows plugins to return
early data, but then munges that return to pass the client something
that starts from the client's requested offset)

> +    for (i = 0; i < exts.size; ++i) {
> +      uint64_t d;
> +
> +      if (exts.ptr[i].offset + exts.ptr[i].length <= offset)
> +        continue;

which makes me wonder if this 'if' was dead code.

> +      if (exts.ptr[i].offset < offset) {
> +        d = offset - exts.ptr[i].offset;
> +        exts.ptr[i].offset += d;
> +        exts.ptr[i].length -= d;
> +        assert (exts.ptr[i].offset == offset);
> +      }
> +      if (exts.ptr[i].offset + exts.ptr[i].length > offset + count) {
> +        d = offset + count - exts.ptr[i].offset - exts.ptr[i].length;
> +        exts.ptr[i].length -= d;
> +        assert (exts.ptr[i].length == offset + count);
> +      }
> +      if (extent_list_append (ret, exts.ptr[i]) == -1) {
> +        perror ("realloc");
> +        exit (EXIT_FAILURE);
> +      }
> +
> +      offset += exts.ptr[i].length;
> +      count -= exts.ptr[i].length;
> +    }
> +  }
> +
> +  free (exts.ptr);
> +}
> +

> +++ b/copy/nbdcopy.pod
> @@ -4,7 +4,9 @@ nbdcopy - copy to and from an NBD server
>  
>  =head1 SYNOPSIS
>  
> - nbdcopy [-C N|--connections=N] [--flush] [-p|--progress]
> + nbdcopy [--allocated] [-C N|--connections=N]
> +         [--destination-is-zero|--target-is-zero]
> +         [--flush] [--no-extents] [-p|--progress]
>           [-R N|--requests=N] [--synchronous]
>           [-T N|--threads=N]
>           SOURCE DESTINATION
> @@ -74,6 +76,15 @@ formats use C<qemu-img convert>, see L<qemu-img(1)>.
>  
>  Display brief command line help and exit.
>  
> +=item B<--allocated>
> +
> +Normally nbdcopy tries to create a sparse output (with holes), if the
> +destination supports that.  It does this in two ways: either using
> +extent informtation from the source to copy holes (see

information

> +I<--no-extents>), or by detecting runs of zeroes (see I<-S>).  If you
> +use I<--allocated> then nbdcopy creates a fully allocated, non-sparse
> +output on the destination.
> +
>  =item B<-C> N
>  
>  =item B<--connections=>N
> @@ -82,11 +93,30 @@ Set the maximum number of NBD connections ("multi-conn").  By default
>  nbdcopy will try to use multi-conn with up to 4 connections if the NBD
>  server supports it.
>  
> +=item B<--destination-is-zero>
> +
> +=item B<--target-is-zero>
> +
> +Assume the destination is already zeroed.  This allows nbdcopy to skip
> +copying blocks of zeroes from the source to the destination.  This is
> +not safe unless the destination device is already zeroed.
> +(I<--target-is-zero> is provided for compatibility with
> +L<qemu-img(1)>.)
> +
>  =item B<--flush>
>  
>  Flush writes to ensure that everything is written to persistent
>  storage before nbdcopy exits.
>  
> +=item B<--no-extents>
> +
> +Normally nbdcopy uses extent metadata to skip over parts of the source
> +disk which contain holes.  If you use this flag, nbdcopy ignores
> +extent information and reads everything, which is usually slower.  You
> +might use this flag in two situations: the source NBD server has
> +incorrect metadata information; or the source has very slow extent
> +querying so it's faster to simply read all of the data.
> +

I'd still love to teach nbdkit how to send NBD_CMD_READ replies with
zero chunks (right now, only qemu-nbd does that); even with
--no-extents, proper handling of zero chunks can still let us proceed
faster than blindly passing full runs of zeros over the network.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org