[libvirt] [RFC] Image Fleecing for Libvirt (BZ 955734, 905125)

Wed Jul 24 09:51:35 UTC 2013

On Tue, Jul 23, 2013 at 09:40:56PM -0600, Eric Blake wrote:
> [replying with useful information from another off-list email]
> 
> On 07/15/2013 03:04 PM, Richard W.M. Jones wrote:
> > On Mon, Jul 15, 2013 at 05:57:12PM +0800, Fam Zheng wrote:
> >> Hi all,
> >>
> >> QEMU-KVM BZ 955734, and libvirt BZ 905125 are about feature "Read-only
> >> point-in-time throwaway snapshot". The development is ongoing on
> >> upstream, which implements the core functionality by QMP command
> >> drive-backup. I want to demonstrate the HMP/QMP commands here for image
> >> fleecing tasks (again) and make sure this interface looks ready and
> >> satisfying from Libvirt point of view.
> >>
> 
> 
> On 07/15/2013 06:24 AM, Paolo Bonzini wrote:> Il 15/07/2013 11:57, Fam
> Zheng ha scritto:
> >> Hi all,
> >>
> >> QEMU-KVM BZ 955734, and libvirt BZ 905125 are about feature "Read-only
> >> point-in-time throwaway snapshot". The development is ongoing on
> >> upstream, which implements the core functionality by QMP command
> >> drive-backup. I want to demonstrate the HMP/QMP commands here for image
> >> fleecing tasks (again) and make sure this interface looks ready and
> >> satisfying from Libvirt point of view.
> >
> > And since we are at it, here is a possible libvirt API to expose this
> > functionality (cut-and-paste from an old email).  If needed, VDSM can
> > provide a similar API and proxy the libvirt API.
> >
> > Would something like this work?
> >
> > int        virDomainBlockPeekStart        (virDomainPtr dom,
> >                                  const char ** disks,
> >                                  unsigned int flags);
> >
> >         Make it possible to use virDomainBlockPeek on the given disks
> >         with the new VIR_DOMAIN_BLOCK_PEEK_IMAGE flag.
> >
> >         It is okay to create multiple "snapshot groups", i.e. to invoke
> >         the function multiple times with VIR_DOMAIN_BLOCK_PEEK_SNAPSHOT.
> >         It is however not okay to specify the same disk multiple times
> >         unless all of them are _without_ VIR_DOMAIN_BLOCK_PEEK_SNAPSHOT.
> >
> >         flags:
> >         VIR_DOMAIN_BLOCK_PEEK_SNAPSHOT
> >         Make an atomic point-in-time snapshot of all the disks included
> >         in the list of strings "disks", and expose the snapshot via
> >         virDomainBlockPeek
> >
> >         Note: if the virtual machine is running, this will use
> >         nbd-server-start/add/end.  If the virtual machine is paused,
> >         this will use qemu-nbd.  Libvirt should be able to switch
> >         transparently from one method to the other.
> >
> > int        virDomainBlockPeekStop (virDomainPtr dom);
> >
> >         Stop communication with qemu-nbd or the hypervisor.
> >
> >
> > VIR_DOMAIN_BLOCK_PEEK_IMAGE
> >
> >         A new flag for virDomainBlockPeek.  If specified,
> >         virDomainBlockPeek will access the disk image, not the "raw"
> >         file (i.e. it will read data as seen by the guest).  This
> >         is only valid if virDomainBlockPeekStart has been called before
> >         for this disk.

I don't much like this retro-fitting of start/stop actions into the
virDomainBlockPeek API as a design, particularly the binding of the
PEEK_IMAGE flag to the start/stop actions. Conceptually it would be
perfectly possible for a hypervisor to implement support PEEK_IMAGE
without these start/stop actions, which are somewhat specific to the
need for QEMU to start an NBD driver.

The virDomainBlockPeek API is also not particularly efficient as an
API, because each read incurrs a round-trip over libvirt's RPC service
already. We'd then be adding a round-trip over NBD too.

I'm wondering if we could instead try to utilize the virStreamPtr
APIs for this task. From a libvirt's RPC POV this much more efficient
because once you open the region with a stream API, you don't have any
round trips at all - the data is pushed out to/from the client async.

Now those APIs are currently designed for sequential streaming of
entire data regions only, but I wonder if we could extend them
somehow to enable seek'ing within the stream. Alternatively perhaps
we could just say if you want to read from dis-joint regions, that
you can just re-open a stream for each region to be processed.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|