[Libguestfs] [PATCH nbdcopy] copy: Implement extent metadata for efficient copying.

Wed Feb 24 10:22:17 UTC 2021

On Mon, Feb 22, 2021 at 03:19:18PM -0600, Eric Blake wrote:
> On 11/24/20 7:04 AM, Richard W.M. Jones wrote:
> > This implements these interconnected options:
> > 
> >   --allocated
> >   --destination-is-zero (alias: --target-is-zero)
> >   --no-extents
> > ---
> >  TODO                           |   6 +-
> 
> Late review inspired by Nir's recent work, oh well.
> 
> > +++ b/copy/copy-sparse-allocated.sh
> > @@ -0,0 +1,92 @@
> 
> > +
> > +requires nbdkit --version
> > +requires nbdkit --exit-with-parent --version
> 
> Do we really need the first line, since the second covers the same?
> 
> > +++ b/copy/copy-sparse-no-extents.sh
> 
> > +requires nbdkit --version
> > +requires nbdkit --exit-with-parent --version
> 
> Multiple instances.

It's not a big deal either way, but the thinking behind this is:
either nbdkit might not be installed (which would be caught by the
first line) or nbdkit might not have the --exit-with-parent option
because it's running on an OS which doesn't support that like Windows.
It doesn't really matter for practical purposes either way.

> > +++ b/copy/copy-sparse.sh
> > @@ -0,0 +1,97 @@
> 
> > +# Check the output matches expected.
> > +if [ "$(cat $out)" != "pwrite 1 4294967296
> > +pwrite 32768 0
> > +pwrite 32768 1073709056
> > +pwrite 32768 4294934528
> > +trim 134184960 32768
> 
> Sorry I didn't notice this earlier, but Nir is on the correct track, and
> updates this test to expect calls to zero instead (as we cannot
> guarantee that read-after-trim reads zero, at least not without a
> further NBD spec extension).

Yup, Nir has now fixed this, thanks :-)

> > +++ b/copy/file-ops.c
> > @@ -24,7 +24,16 @@
> 
> > +
> > +/* This is done synchronously, but that's fine because commands from
> > + * the previous work range in flight continue to run, it's difficult
> > + * to (sanely) start new work until we have the full list of extents,
> > + * and in almost every case the remote NBD server can answer our
> > + * request for extents in a single round trip.
> > + */
> > +static void
> > +nbd_get_extents (struct rw *rw, uintptr_t index,
> > +                  uint64_t offset, uint64_t count,
> > +                  extent_list *ret)
> > +{
> > +  extent_list exts = empty_vector;
> 
> > +
> > +    /* Copy the extents returned into the final list (ret).  This is
> > +     * complicated because the extents returned by the server may
> > +     * begin earlier and begin or end later than the requested size.
> > +     */
> 
> Can the extents returned by the server actually begin earlier?  They can
> definitely end earlier or later than the reequested size, but it seems
> like the beginning is always fixed.  (nbdkit allows plugins to return
> early data, but then munges that return to pass the client something
> that starts from the client's requested offset)

Yes they cannot begin earlier, because the NBD_REPLY_TYPE_BLOCK_STATUS
message descriptors only contain a length field.  We could therefore
simplify this.

> > +    for (i = 0; i < exts.size; ++i) {
> > +      uint64_t d;
> > +
> > +      if (exts.ptr[i].offset + exts.ptr[i].length <= offset)
> > +        continue;
> 
> which makes me wonder if this 'if' was dead code.

I think so, yes.

> > +      if (exts.ptr[i].offset < offset) {
> > +        d = offset - exts.ptr[i].offset;
> > +        exts.ptr[i].offset += d;
> > +        exts.ptr[i].length -= d;
> > +        assert (exts.ptr[i].offset == offset);
> > +      }
> > +      if (exts.ptr[i].offset + exts.ptr[i].length > offset + count) {
> > +        d = offset + count - exts.ptr[i].offset - exts.ptr[i].length;
> > +        exts.ptr[i].length -= d;
> > +        assert (exts.ptr[i].length == offset + count);
> > +      }
> > +      if (extent_list_append (ret, exts.ptr[i]) == -1) {
> > +        perror ("realloc");
> > +        exit (EXIT_FAILURE);
> > +      }
> > +
> > +      offset += exts.ptr[i].length;
> > +      count -= exts.ptr[i].length;
> > +    }
> > +  }
> > +
> > +  free (exts.ptr);
> > +}
> > +
> 
> > +++ b/copy/nbdcopy.pod
> > @@ -4,7 +4,9 @@ nbdcopy - copy to and from an NBD server
> >  
> >  =head1 SYNOPSIS
> >  
> > - nbdcopy [-C N|--connections=N] [--flush] [-p|--progress]
> > + nbdcopy [--allocated] [-C N|--connections=N]
> > +         [--destination-is-zero|--target-is-zero]
> > +         [--flush] [--no-extents] [-p|--progress]
> >           [-R N|--requests=N] [--synchronous]
> >           [-T N|--threads=N]
> >           SOURCE DESTINATION
> > @@ -74,6 +76,15 @@ formats use C<qemu-img convert>, see L<qemu-img(1)>.
> >  
> >  Display brief command line help and exit.
> >  
> > +=item B<--allocated>
> > +
> > +Normally nbdcopy tries to create a sparse output (with holes), if the
> > +destination supports that.  It does this in two ways: either using
> > +extent informtation from the source to copy holes (see
> 
> information

Will fix.

> > +I<--no-extents>), or by detecting runs of zeroes (see I<-S>).  If you
> > +use I<--allocated> then nbdcopy creates a fully allocated, non-sparse
> > +output on the destination.
> > +
> >  =item B<-C> N
> >  
> >  =item B<--connections=>N
> > @@ -82,11 +93,30 @@ Set the maximum number of NBD connections ("multi-conn").  By default
> >  nbdcopy will try to use multi-conn with up to 4 connections if the NBD
> >  server supports it.
> >  
> > +=item B<--destination-is-zero>
> > +
> > +=item B<--target-is-zero>
> > +
> > +Assume the destination is already zeroed.  This allows nbdcopy to skip
> > +copying blocks of zeroes from the source to the destination.  This is
> > +not safe unless the destination device is already zeroed.
> > +(I<--target-is-zero> is provided for compatibility with
> > +L<qemu-img(1)>.)
> > +
> >  =item B<--flush>
> >  
> >  Flush writes to ensure that everything is written to persistent
> >  storage before nbdcopy exits.
> >  
> > +=item B<--no-extents>
> > +
> > +Normally nbdcopy uses extent metadata to skip over parts of the source
> > +disk which contain holes.  If you use this flag, nbdcopy ignores
> > +extent information and reads everything, which is usually slower.  You
> > +might use this flag in two situations: the source NBD server has
> > +incorrect metadata information; or the source has very slow extent
> > +querying so it's faster to simply read all of the data.
> > +
> 
> I'd still love to teach nbdkit how to send NBD_CMD_READ replies with
> zero chunks (right now, only qemu-nbd does that); even with
> --no-extents, proper handling of zero chunks can still let us proceed
> faster than blindly passing full runs of zeros over the network.

Structured replies?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/