[Libguestfs] [PATCH nbdcopy] copy: Implement extent metadata for efficient copying.
Richard W.M. Jones
rjones at redhat.com
Wed Feb 24 10:22:17 UTC 2021
On Mon, Feb 22, 2021 at 03:19:18PM -0600, Eric Blake wrote:
> On 11/24/20 7:04 AM, Richard W.M. Jones wrote:
> > This implements these interconnected options:
> >
> > --allocated
> > --destination-is-zero (alias: --target-is-zero)
> > --no-extents
> > ---
> > TODO | 6 +-
>
> Late review inspired by Nir's recent work, oh well.
>
> > +++ b/copy/copy-sparse-allocated.sh
> > @@ -0,0 +1,92 @@
>
> > +
> > +requires nbdkit --version
> > +requires nbdkit --exit-with-parent --version
>
> Do we really need the first line, since the second covers the same?
>
> > +++ b/copy/copy-sparse-no-extents.sh
>
> > +requires nbdkit --version
> > +requires nbdkit --exit-with-parent --version
>
> Multiple instances.
It's not a big deal either way, but the thinking behind this is:
either nbdkit might not be installed (which would be caught by the
first line) or nbdkit might not have the --exit-with-parent option
because it's running on an OS which doesn't support that like Windows.
It doesn't really matter for practical purposes either way.
> > +++ b/copy/copy-sparse.sh
> > @@ -0,0 +1,97 @@
>
> > +# Check the output matches expected.
> > +if [ "$(cat $out)" != "pwrite 1 4294967296
> > +pwrite 32768 0
> > +pwrite 32768 1073709056
> > +pwrite 32768 4294934528
> > +trim 134184960 32768
>
> Sorry I didn't notice this earlier, but Nir is on the correct track, and
> updates this test to expect calls to zero instead (as we cannot
> guarantee that read-after-trim reads zero, at least not without a
> further NBD spec extension).
Yup, Nir has now fixed this, thanks :-)
> > +++ b/copy/file-ops.c
> > @@ -24,7 +24,16 @@
>
> > +
> > +/* This is done synchronously, but that's fine because commands from
> > + * the previous work range in flight continue to run, it's difficult
> > + * to (sanely) start new work until we have the full list of extents,
> > + * and in almost every case the remote NBD server can answer our
> > + * request for extents in a single round trip.
> > + */
> > +static void
> > +nbd_get_extents (struct rw *rw, uintptr_t index,
> > + uint64_t offset, uint64_t count,
> > + extent_list *ret)
> > +{
> > + extent_list exts = empty_vector;
>
> > +
> > + /* Copy the extents returned into the final list (ret). This is
> > + * complicated because the extents returned by the server may
> > + * begin earlier and begin or end later than the requested size.
> > + */
>
> Can the extents returned by the server actually begin earlier? They can
> definitely end earlier or later than the reequested size, but it seems
> like the beginning is always fixed. (nbdkit allows plugins to return
> early data, but then munges that return to pass the client something
> that starts from the client's requested offset)
Yes they cannot begin earlier, because the NBD_REPLY_TYPE_BLOCK_STATUS
message descriptors only contain a length field. We could therefore
simplify this.
> > + for (i = 0; i < exts.size; ++i) {
> > + uint64_t d;
> > +
> > + if (exts.ptr[i].offset + exts.ptr[i].length <= offset)
> > + continue;
>
> which makes me wonder if this 'if' was dead code.
I think so, yes.
> > + if (exts.ptr[i].offset < offset) {
> > + d = offset - exts.ptr[i].offset;
> > + exts.ptr[i].offset += d;
> > + exts.ptr[i].length -= d;
> > + assert (exts.ptr[i].offset == offset);
> > + }
> > + if (exts.ptr[i].offset + exts.ptr[i].length > offset + count) {
> > + d = offset + count - exts.ptr[i].offset - exts.ptr[i].length;
> > + exts.ptr[i].length -= d;
> > + assert (exts.ptr[i].length == offset + count);
> > + }
> > + if (extent_list_append (ret, exts.ptr[i]) == -1) {
> > + perror ("realloc");
> > + exit (EXIT_FAILURE);
> > + }
> > +
> > + offset += exts.ptr[i].length;
> > + count -= exts.ptr[i].length;
> > + }
> > + }
> > +
> > + free (exts.ptr);
> > +}
> > +
>
> > +++ b/copy/nbdcopy.pod
> > @@ -4,7 +4,9 @@ nbdcopy - copy to and from an NBD server
> >
> > =head1 SYNOPSIS
> >
> > - nbdcopy [-C N|--connections=N] [--flush] [-p|--progress]
> > + nbdcopy [--allocated] [-C N|--connections=N]
> > + [--destination-is-zero|--target-is-zero]
> > + [--flush] [--no-extents] [-p|--progress]
> > [-R N|--requests=N] [--synchronous]
> > [-T N|--threads=N]
> > SOURCE DESTINATION
> > @@ -74,6 +76,15 @@ formats use C<qemu-img convert>, see L<qemu-img(1)>.
> >
> > Display brief command line help and exit.
> >
> > +=item B<--allocated>
> > +
> > +Normally nbdcopy tries to create a sparse output (with holes), if the
> > +destination supports that. It does this in two ways: either using
> > +extent informtation from the source to copy holes (see
>
> information
Will fix.
> > +I<--no-extents>), or by detecting runs of zeroes (see I<-S>). If you
> > +use I<--allocated> then nbdcopy creates a fully allocated, non-sparse
> > +output on the destination.
> > +
> > =item B<-C> N
> >
> > =item B<--connections=>N
> > @@ -82,11 +93,30 @@ Set the maximum number of NBD connections ("multi-conn"). By default
> > nbdcopy will try to use multi-conn with up to 4 connections if the NBD
> > server supports it.
> >
> > +=item B<--destination-is-zero>
> > +
> > +=item B<--target-is-zero>
> > +
> > +Assume the destination is already zeroed. This allows nbdcopy to skip
> > +copying blocks of zeroes from the source to the destination. This is
> > +not safe unless the destination device is already zeroed.
> > +(I<--target-is-zero> is provided for compatibility with
> > +L<qemu-img(1)>.)
> > +
> > =item B<--flush>
> >
> > Flush writes to ensure that everything is written to persistent
> > storage before nbdcopy exits.
> >
> > +=item B<--no-extents>
> > +
> > +Normally nbdcopy uses extent metadata to skip over parts of the source
> > +disk which contain holes. If you use this flag, nbdcopy ignores
> > +extent information and reads everything, which is usually slower. You
> > +might use this flag in two situations: the source NBD server has
> > +incorrect metadata information; or the source has very slow extent
> > +querying so it's faster to simply read all of the data.
> > +
>
> I'd still love to teach nbdkit how to send NBD_CMD_READ replies with
> zero chunks (right now, only qemu-nbd does that); even with
> --no-extents, proper handling of zero chunks can still let us proceed
> faster than blindly passing full runs of zeros over the network.
Structured replies?
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
More information about the Libguestfs
mailing list