[libvirt] RFC: API additions for enhanced snapshot support

Tue Jun 21 13:53:06 UTC 2011

On 06/21/2011 04:30 AM, Daniel P. Berrange wrote:
>> Upstream qemu is developing a 'live snapshot' feature, which allows the
>> creation of a snapshot without the current downtime of several seconds
>> required by the current 'savevm' monitor command, as well as means for
>> controlling applications (libvirt) to request that qemu pause I/O to a
>> particular disk, then externally perform a snapshot, then tell qemu to
>> resume I/O (perhaps on a different file name or fd from the host, but
>> with no change to the contents seen by the guest).  Eventually, these
>> changes will make it possible for libvirt to create fast snapshots of
>> LVM partitions or btrfs files for guest disk images, as well as to
> 
> Actually, IIUC, the QEMU 'live snapshot' feature is only for special
> disk formats like qcow2, qed, etc.

Does anyone have pointers to the qemu implementation of monitor commands
used for live snapshot?

> For formats like LVM, brtfs, SCSI, etc,  libvirt will have todo all
> the work of creating the snapshot, possibly then telling QEMU to
> switch the backing file of a virtual disk to the new image (if the
> snapshot mechanism works that way).

Yes, that was what I was envisioning.

> 
>> select which disks are saved in a snapshot (that is, save a
>> crash-consistent state of a subset of disks, without the corresponding
>> RAM state, rather than making a full system restore point); the latter
>> would work best with guest cooperation to quiesce disks before qemu
>> pauses I/O to that disk, but that is an orthogonal enhancement.
> 
> At the very least, you need a way to store QEMU writing to the disk
> for a period of time, whether or not the guest is quiesced. There
> are basically 3 options
> 
>  1. Pause the guest CPUs (eg  'stop' on the monitor)
>  2. QEMU queues I/O from guest in memory temporarily (does not currently exist)
>  3. QEMU tells guest to quiesce I/O temporarily (does not currently exist)
> 
> To perform a snapshot libvirt would need todo 
> 
>  1. Stop I/O using one of the 3 methods above
>  2. If disk is a special format
>       - Ask QEMU to snapshot it
>     Else
>       - Create snapshot ourselves
>       - Update QEMU disk backing path (optional)
>  3. Resume I/O

It is step 2B (create the snapshot ourselves) where the proposed
virStorageVolSnapshot* APIs would be useful.  The remaining steps also
need implementation, but I believe that they can fit into existing APIs
by the use of new flag values, rather than requiring any new API.

> 
>> However, my first goal with API enhancements is to merely prove that
>> libvirt can manage a live snapshot by using qemu-img on a qcow2 image
>> rather than the current 'savevm' approach of qemu doing all the work.
> 
> FYI, QEMU developers are adament that if the disk image is open
> by QEMU you should, in general, not do anything using qemu-img
> on that disk image.

Agreed.  And I further think that we need to expend some efforts making
the new image locking code also play well with libvirt - that is, any
virStorageVol API that can modify a disk image (rather than just do a
read-only operation describing the image) should probably be taught to
fail if any active domain is also using that image.  Conversely, if a
long-running virStorageVol API is started on a volume, then an attempt
to virDomainStart a domain should see that the volume is already in use
and fail just as if the volume had been locked by another running domain.

> libvirt does currently do things like querying
> disk capacity, but we can get away with that because it is an
> invariant section of the header. We certainly can't create internal
> snapshots with qemu-img while the guest is live. Creating external
> snapshots with qemu-img is probably OK, but when I've suggested
> this before QEMU developers were unhappy with even that.

Basically, my proposed virStorageVolSnapshot APIs should only be used on
inactive volumes; for a running domain, you should always go through the
existing virDomainSnapshot API, which can then make appropriate
decisions whether to do external snapshots, or whether to have qemu do
the work because the image is qcow2.  I think we're in agreement here,
and that it still doesn't impact the decision for adding new API for
offline snapshot management.

> 
> What I'm not seeing here, is how these APIs all relate to the existing
> support we have in virStorageVol APIs for creating snapshots. THis is
> already implemented for LVM, QCow, QCow2.

The only existing snapshot API that I found was
virDomainSnapshotCreateXML, which only works on qcow2 (not lvm or qcow),
and which works either online (via qemu) or offline (via qemu-img).  But
I could have overlooked something - where is the existing API for
creating an LVM snapshot?  For volume creation, I'm aware of code for
specifying a backing file for an existing file, but backing files aren't
necessarily the same as snapshots, are they?

> The snapshots are created by
> specifying a backing file in the initial volume description. Depending
> on the storage type, the backing file for a snapshot can be writable,
> or readonly. Snapshots appear as just more storage volumes, and are not
> restricted to being within the same pool as the original volume. You can
> also mix storage formats, eg, create a Qcow2 volume with backing file
> on LVM, which is itself a snapshot of another LVM volume.
> 
> The QCow2 internal snapshots don't really fit into our existing model,
> since they don't have extra associated external files, so maybe we do
> still want some of these explicit APIs to query snapshots against
> volumes.
> 

-- 
Eric Blake   eblake at redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 619 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20110621/2eed88ea/attachment-0001.sig>