[libvirt] RFC API proposal: virDomainBlockRebase

Adam Litke agl at us.ibm.com
Tue Jan 31 20:53:58 UTC 2012


On Tue, Jan 31, 2012 at 09:28:51AM -0700, Eric Blake wrote:
> Right now, the existing virDomainBlockPull API has a tough limitation -
> it is an all-or-none approach.  In all my examples below, I'm starting
> from the following relationship, where '<-' means 'is a backing file of':
> 
> template <- intermediate <- current
> 
> virDomainBlockPull can only convert things in a forward direction, with
> the merge destination being the current image, resulting in:
> 
> merge template and intermediate into current, creating:
> current
> 
> Meanwhile, qemu is adding support for a partial block pull operation,
> still on the current image as the merge destination, but where you can
> now specify an optional argument to limit the pull to just the
> intermediate files and altering the current image to be backed by an
> ancestor file, as in:
> 
> merge intermediate into current, creating:
> template <- current
> 
> For 0.9.10, I'd like to add the following API:
> 
> /**
>  * virDomainBlockRebase:
>  * @dom: pointer to domain object
>  * @disk: path to the block device, or device shorthand
>  * @base: new base image, or NULL for entire block pull
>  * @bandwidth: (optional) specify copy bandwidth limit in Mbps
>  * @flags: extra flags; not used yet, so callers should always pass 0

What is the format of the @base arg?  My first thought would be a path, but what
if the desired image file is not directly known to libvirt?

>  * Populate a disk image with data from its backing image chain, and
>  * setting the new backing image to @base, where base is the absolute
>  * path of one of the backing images in the chain.  If @base is NULL,
>  * then this operation is identical to virDomainBlockPull().  Once all
>  * data from its backing image chain has been pulled, the disk no
>  * longer depends on those intermediate backing images.  This function
>  * pulls data for the entire device in the background.  Progress of the
>  * operation can be checked with virDomainGetBlockJobInfo() and
>  * the operation can be aborted with virDomainBlockJobAbort().  When
>  * finished, an asynchronous event is raised to indicate the final
>  * status.
>  *
>  * The @disk, @bandwidth, and @flags parameters are handled as in
>  * virDomainBlockPull().
>  *
>  * Returns 0 if the operation has started, -1 on failure.
>  */
> int virDomainBlockRebase(virDomainPtr dom, const char *disk,
>                          const char *base,
>                          unsigned long bandwidth, unsigned int flags);
> 
> Given that Adam has a pending patch to support a
> VIR_DOMAIN_BLOCK_PULL_ASYNC flag, this same flag would have to be
> supported in virDomainBlockRebase.

That patch only applies to virDomainBlockJobCancel().  The blockJob initiators
(virDomainBlockPull and this new one) already use an async mode of operation
because the call simply starts the block job.

> I've also been chatting with Federico Simoncelli about how the above
> operation would work for VDSM purposes in doing a live block move, while
> preserving a common template base file:
> 
> start with:
> vda: template <- current1
> 
> create a disk-only snapshot, with:
>  tmpsnap = virDomainSnapshotCreateXML(dom,
>  "<domainsnapshot>\n"
>  "  <disks>\n"
>  "    <disk name='vda'>\n"
>  "      <source>/path/to/current2</source>\n"
>  "    </disk>\n"
>  "  <disks>\n"
>  "</domainsnapshot>", VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)
> where the xml calls out the destination file name, resulting in:
> vda: template <- current1 <- current2
> 
> perform the block rebase, with:
>  virDomainBlockRebase(dom, "vda", "/path/to/template",
>  VIR_DOMAIN_BLOCK_PULL_ASYNC)
> as well as waiting for the event (or polling status) to wait for
> completion, resulting in:
> vda: template <- current2
> 
> delete the disk-only snapshot metadata as no longer useful, with:
>  virDomainSnapshotDelete(tmpsnap,
>  VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY)

Yep, seems like a very good method.

> At one point, I thought of creating a single libvirt API that performs
> all of those steps in one call; but right now, I'm not proposing that,
> because of the fact that qemu has no way to undo a snapshot.  In other
> words, without an undo operation, if the snapshot phase succeeds but the
> block rebase phase fails, a single API would have to report failure even
> though the domain was altered, while the ideal scenario is that
> reporting failure means things were in the same state as before the API
> started.
> 
> 
> Beyond 0.9.10, there are some additional useful merge patterns that
> might be worth exposing.  All of these operations are already possible
> on offline images, using qemu-img; but none of them are possible on live
> images using current qemu, which is why I'm thinking it is something for
> another day.  I'm also hoping to someday enhance the set of
> virStorageVol APIs to make backing file manipulation of offline images
> easier.  At any rate, the addition merge operations are:
> 
> forward live merge with a non-current image as the merge destination, as in:
> 
> merge template into intermediate, creating:
> intermediate <- current


> backward merge of a current image (that is, undoing a current snapshot):
> 
> merge current into intermediate, creating:
> template <- intermediate
> 
> and backward merge of a non-current image (that is, undoing an earlier
> snapshot, but by modifying the template rather than the current image):
> 
> merge intermediate into base, creating:
> template <- current

Don't these raise some security concerns about modifying a potentially shared
intermediate image?

> Backward merge of the current image seems like something easy to fit
> into my proposed API (add a new flag, maybe called
> VIR_DOMAIN_BLOCK_REBASE_BACKWARD).  Manipulations of anything that does
> not involve the current image seems tougher, assuming qemu ever even
> reaches the point where it exposes those operations on live volumes -
> the user has to specify not one, but two backing file names.  But even
> that could possibly be fit into my API, by adding a flag that states
> that the const char *backing argument is treated as an XML snippet
> describing the full details of the merge, with the XML listing which
> image is being merged to which destination, rather than as just the name
> of the backing file becoming the new base of the current image.  Perhaps
> something like:
> 
> virDomainBlockRebase(dom, block,
>   "<rebase>\n"
>   "  <source>/path/to/intermediate</source>\n"
>   "  <dest>/path/to/template</dest>\n"
>   "</rebase>",
>   VIR_DOMAIN_BLOCK_REBASE_XML|VIR_DOMAIN_BLOCK_REBASE_BACKWARD)
> 
> as a specification to take the contents of intermediate, merge those
> backwards into template, and as well as adjusting the rest of the
> backing file chain so that whatever used to be backed by intermediate is
> now backed by template.  Or, if qemu ever gives us the ability to merge
> non-current images, we may decide at that time that it is worth a new
> API to expose those new complexities.

This is all starting to scare me so I will defer to the storage pros :)

> Another thing I have been thinking about is virDomainSnapshotDelete.
> The above conversation talks about merging of a single disk, but a live
> disk snapshot operation can create backing file chains for multiple
> disks at once, all tracked by a snapshot.  Additionally, the current
> code allows a snapshot delete of internal snapshots, but refuses to do
> anything useful with an external snapshot, because there is currently no
> way to specify if the snapshot is removed by merging the base into the
> new current, or by undoing the current and merging it backwards into the
> base.  Alas, virDomainSnapshotDelete doesn't take any arguments for how
> to handle the situation, and use of a flag to make the decision would
> limit all disks to be handled in the same manner.  So what I'm thinking
> is that when a snapshot is created (or redefined, using redefinition as
> the vehicle to add in the new XML), that the snapshot XML itself can
> record the preferred direction for undoing the snapshot; for example:
> 
> <domainsnapshot>
>   <disks>
>     <disk name='/path/to/old_vda'>
>       <source file='/path/to/new_vda'/>
>       <on_delete merge='forward'/>
>     </disk>
>     <disk name='/path/to/old_vdb'>
>       <source file='/path/to/new_vdb'/>
>       <on_delete merge='backward'/>
>     </disk>
>   <disks>
> </domainsnapshot>
> 
> then when virDomainSnapshotDelete is called on that snapshot, old_vda
> would be forward merged into new_vda, while new_vdb would be backward
> merged into old_vdb. Again, that's food for thought for post-0.9.10, and
> shouldn't get in the way of adding virDomainBlockRebase() now.
> 
> -- 
> Eric Blake   eblake at redhat.com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 



-- 
Adam Litke <agl at us.ibm.com>
IBM Linux Technology Center




More information about the libvir-list mailing list