[libvirt] RFC: API additions for enhanced snapshot support

Wed Jul 6 11:41:05 UTC 2011

On Tue, Jul 5, 2011 at 8:59 PM, Eric Blake <eblake at redhat.com> wrote:
> On 07/04/2011 08:19 AM, Stefan Hajnoczi wrote:
>> On Thu, Jun 16, 2011 at 6:41 AM, Eric Blake <eblake at redhat.com> wrote:
>>
>> Robert, Fernando, Jagane: I have CCed you because we have discussed
>> snapshot APIs and I thought you'd be interested in Eric's work to
>> build them for libvirt.
>>
>> Does each volume have its own independent snapshot namespace?  It may
>> be wise to document that snapshot namespaces are *not* independent
>> because storage backends may not be able to provide these semantics.
>
> Good question, and I'm not quite sure on the best way to represent this.
>
> For qcow2 internal snapshots, the answer is obvious - each qcow2 image
> has its own snapshot namespace (and you can currently view this with
> qemu-img snapshot -l).  But it looks like the existing drive to add qemu
> snapshot support to live domains is focusing solely on external
> snapshots (that is, a qcow2 snapshot involves creating a new filename,
> then marking the old filename as a read-only backing image of the new
> qcow2 filename; where qcow2 can even be used as a snapshot around a raw
> file).
>
> For external snapshots, I'm not sure whether the namespace should be
> specific to a storage pool (that is, any additional metadata that
> libvirt needs to track snapshot relationships should be stored in the
> same directory as the disk images themselves) or specific to the libvirt
> host (that is, just as libvirt's notion of a persistent storage pool
> happens to be stored in /etc/libvirt/storage/pool.xml, libvirt should
> also manage a file /etc/libvirt/storage/pool/snapshot.xml to describe
> all snapshots tracked within "pool").  But I'm certainly thinking that
> it is more likely to be a pool-wide namespace, rather than a
> storage-volume local namespace.

Pool-wide seems reasonable.

>>
>> Is this function necessary when you already have
>> virStorageVolSnapshotListNames()?
>
> Maybe I need to understand why we have it for the virDomainSnapshot
> case, and whether it still makes sense for a disk image that is not
> associated with a domain.  To some degree, I think it seems necessary,
> to reflect the fact that with internal qcow2 snapshots, I can do:
>
> qemu-img snapshsot -c one file
> run then stop vm to modify file
> qemu-img snapshot -c two file
> qemu-img snapshot -a one file
> run then stop vm to modify file
> qemu-img snapshot -c three file
>
> with the resulting hierarchy:
>
> one -> two
>   \-> three
>
> On the other hand, qemu-img doesn't appear to list any hierarchies
> between internal snapshots - that is, while 'qemu-img snapshot -l' will
> list one, two, and three, it gives no indication that three depends on
> one but not two, nor whether the current state of the file would be a
> delta against three, two, one, or even parent-less.

There is no explicit relationship between internal qcow2 snapshots.
qcow2 does reference counting of the actual data clusters and tables
but the snapshot itself is oblivious.  You can delete "one" without
affecting "two" or "three".  There is no dependency relationship
between snapshots themselves, only reference counts on data clusters
and tables.

Here is how qcow2 snapshot operations work:

1. Create snapshot
Increment reference counts for entire active image.
Copy active L1 table into the snapshot data structure.

2. Activate snapshot
Decrement reference counts for entire active image.
Copy snapshot L1 table into active data structure.
Increment reference counts for entire active image.

3. Delete snapshot
Decrement reference counts for entire snapshot image.

> This also starts to get into questions about the ability to split a
> qcow2 image with internal snapshots.  That is, if I have a single file
> with snapshot one and a delta against that snapshot as the current disk
> state, it would be nice to create a new qcow2 file with identical
> contents to snapshot one, then rebase the existing qcow2 file to have a
> backing file of my new clone file and delete the internal snapshot from
> the original file.  But this starts to sound like work on live block
> copy APIs.  For an offline storage volume, we can do things manually
> (qemu-img snapshot -c to temporarily create yet another snapshot point
> to later return to, qemu-img snapshot -a to revert to the snapshot of
> interest, then qemu-img convert to copy off the contents, then qemu-img
> snapshot -a to the temporary state, then qemu-img snapshot -d to clean
> up both the temporary andstate).  But for a storage volume currently in
> use by qemu, this would imply a new qemu command to have qemu assist in
> streaming out the contents of the snapshot state.

The current live block copy/image streaming APIs do not know about
internal snapshots.  Copy the contents of a snapshot while the VM is
running is technically doable but there is no API and no code for it
in QEMU.

>>> /* Return the most recent snapshot of a volume, if one exists, or NULL
>>> on failure.  Flags is 0 for now.  */
>>> virStorageVolSnapshotPtr virStorageVolSnapshotCurrent(virStorageVolPtr
>>> vol, unsigned int flags);
>>
>> The name should include "revert".  This looks like a shortcut function
>> for virStorageVolRevertToSnapshot().
>
> No, it was intended as a counterpart to virDomainSnapshotCurrent, which
> returns the "current snapshot" if there is one.  But again, it may be
> that while a "current snapshot" makes sense for a domain, it might not
> make sense for a storage volume in isolation.

qcow2 internal snapshots always copy metadata to the "active" image,
they do not allow you to in-place update existing snapshots.  In that
sense a qcow2 image only has one current snapshot, the active image.
The way to update a snapshot is to delete it and create a new one with
the same name.

Stefan