[Libguestfs] Splitting up virt-v2v

Wed Nov 25 10:29:45 UTC 2020

For a long time I've wanted to split up virt-v2v into smaller
components to make it easier to consume.  It's never been clear how to
do this, but I think I have a workable plan now, described in this email.

----------------------------------------------------------------------

First, the AIMS, which are:

(a) Preserve current functionality, including copying conversion,
    in-place conversion, and the virt-v2v command line.

(b) Allow warm migration to use virt-v2v without requiring the
    "--debug-overlays hack".

(c) Allow threads, multi-conn, and parallel copying of guest disks, all
    for better copying performance.

(d) Allow an alternate supervisor to convert and copy many guests in
    parallel, given that the supervisor has a global view of the
    system/network (I'm not intending to implement this, only to make
    it possible).

(e) Better progress bars.

(f) Better logging.

(g) Reuse as much existing code as possible.  This is NOT a rewrite!

----------------------------------------------------------------------

Here's my PLAN:

/usr/bin/virt-v2v still exists, but it's now a supervisor program
(possibly even a shell script) that runs the steps below:

(1) Set up the input side by running "helper-v2v-input-<type>".  For
    all input types this creates a temporary directory containing:

    /tmp/XXXXXX/in1    NBD endpoints overlaying the source disk(s)
    /tmp/XXXXXX/in2    (these are actually Unix domain sockets)
    /tmp/XXXXXX/in3
    /tmp/XXXXXX/metadata.in   Metadata parsed from the source.

    Currently for most inputs we have a running nbdkit process for
    each source disk, and we'd do the same here, except we add
    nbdkit-cow-filter on top so that the source disk is protected from
    being modified.  Another small difference is that for -i disk
    (local input) we would need an active nbdkit process on top of the
    disk, whereas currently we set the disk as a qcow2 backing file.

(2) Perform the conversion by running "helper-v2v-convert".  This does
    the conversion and sparsification.  It writes directly to the NBD
    endpoints (in*) above.  The writes are stored in the COW overlay
    so the source disk is not modified.

    Conversion will also create an output metadata file:

    /tmp/XXXXXX/metadata.out   Target metadata

    Exact format of the metadata files is to be decided, but some kind
    of not-quite-libvirt-XML may be suitable.  It's also not clear if
    the metadata format is an internal detail of virt-v2v, or if we
    document it as a stable API.

(3) Set up the output side by running "helper-v2v-output-<type>
    setup".  This will read the output metadata and do whatever is
    needed to set up the empty output disks (perhaps by creating a
    guest on the target, but also this could be done in step (5)
    below).

    This will create:

    /tmp/XXXXXX/out1    NBD endpoints overlaying the target disk(s)
    /tmp/XXXXXX/out2    (these are actually Unix domain sockets)
    /tmp/XXXXXX/out3

(4) Do the copy.  By default this will run either nbdcopy or qemu-img
    convert from in* -> out*.

    Copying could be done in parallel, currently it is done serially.

(5) Finalize the output by running "helper-v2v-output-<type> final".
    This might create the target guest and whatever else is needed.

(6) Kill the NBD servers and clean up the temporary directory.

----------------------------------------------------------------------

Let's see how this plan matches the aims.

Aim (a):

  Copying conversion works as outlined above.  In-place conversion
  works by placing an NBD server on top of the files you want to
  convert and running helper-v2v-convert (virt-v2v --in-place would
  also still work for backwards compat).

Aim (b):

  Warm migration: Should be fairly clear this can work in the same way
  as in-place conversion, but I'll discuss this further with Martin K
  and Tomas to make sure I'm not missing anything.

Aims (c), (d):

  Threads etc for performance: Although I don't plan to implement
  this, it's clear that an alternate supervisor program could improve
  performance here by either doing copies of a single guest / multiple
  disks in parallel, but even better by having a global view of the
  system and doing copies of multiple guests' disks in parallel.

  This is outside the scope of the virt-v2v project, but in scope for
  something like MTV.

Aim (e):

  Better progress bars: nbdcopy should have support for
  machine-readable progress bars, once I push the changes.  It will
  mean no more need to parse debug logs.

Aim (f):

  Better logging: I hope we can log each step separately.

  A custom supervisor program would also be able to tell which
  particular step failed (eg. did it fail in conversion?  did it fail
  copying a disk and which one?)

Aim (g):

  This works by splitting up the existing v2v code base into separate
  binaries.  It is already broadly structured (internally) like this.
  So it's not a rewrite, it's a big refactoring.

  However I'd probably write a new virt-v2v supervisor binary, because
  the existing command line parsing code is extremely complex.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top