[libvirt] [RFC]: Snapshot API v3
Chris Lalancette
clalance at redhat.com
Wed Mar 31 17:16:44 UTC 2010
On 03/30/2010 08:14 PM, Matthias Bolte wrote:
> 2010/3/30 Chris Lalancette <clalance at redhat.com>:
>> Hello,
>> After our discussions about the snapshot API last week, I went ahead and implemented
>> quite a bit of the API. I also went back to the ESX, Virtualbox, and QEMU API's to
>> try and make sure our API's matched up. What's below is my revised API based on
>> that survey. Following my revised API are notes that I took regarding how the
>> libvirt API matches up to the various API's, and some questions about semantics that
>> I had while doing the survey. More comments and questions are welcome.
>
>> /* Start the guest from the snapshot "snapshot" */
>> int virDomainCreateFromSnapshot(virDomainSnapshotPtr snapshot,
>> unsigned int flags);
>
> Will it be enforced that the domain is shutdown in order to call this function?
>
> ESX doesn't have such a restriction. Not sure about other hypervisors.
Heh, I was just going through that myself. No, it's not required to be
shutdown in general; qemu supports both modes. I've updated the documentation
for this call.
<snip>
>> * Note that if other snapshots would be discarded because of this
>> * MERGE action, this operation will fail. If that is really what is intended,
>> * use MERGE_FORCE.
>> *
>> * With a DISCARD flag, it deletes the snapshot. Note that if children snapshots
>> * would be discarded because of this delete action, this operation will
>> * fail. If this is really what is intended, use DISCARD_FORCE.
>> *
>> * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive.
>> *
>> * Note that this operation can happen when the domain is running or shut
>> * down, though this is hypervisor specific */
>> typedef enum {
>> VIR_DOMAIN_SNAPSHOT_DELETE_MERGE,
>> VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE,
>> VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD,
>> VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE,
>> } virDomainSnapshotDelete;
>> int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot,
>> unsigned int flags);
>>
>> int virDomainSnapshotFree(virDomainSnapshotPtr snapshot);
>>
>> NOTE: During snapshot creation, *none* of the fields are required. That is,
>> you can call virDomainSnapshotCreateXML() with an XML of "<domainsnapshot/>".
>> In this case, the individual driver will make up a <name> and <uuid> for you,
>
> Does <uuid> here refer to a snapshot UUID? As said before, there is no
> easy way have a UUID per snapshot with ESX. Well, we could store
> <uuid>:<name> in the name field on the ESX side, but that's not a
> really good way to do it.
Yeah, agreed, that was a leftover I forgot to edit out. See my reply to
Jiri Denemark, but essentially I'm content to declare duplicate names
unsupported/undefined, and not deal with UUID's at all. I've removed
mention of UUID's from the documentation now.
<snip>
>> The virsh commands will be:
>> virsh snapshot-create <dom> <xmlfile>
>> virsh snapshot-list <dom>
>> virsh snapshot-dumpxml <dom> <name>
>> virsh start-with-snapshot <dom> <snapshotname>
>> virsh snapshot-delete <dom> <snapshotname> [--merge|--mergeforce|--delete|--deleteforce]
>> virsh snapshot-delete-all <dom>
>>
>> Possible issues:
>> 1) I don't see a way to support "managed" save/restore and snapshotting with
>> this API. I think we'll have to have a separate API for managed save/restore.
>
> What's "managed" save/restore and snapshotting?
Oops, yeah, that's a personal note that I didn't really expound upon. One of the
reasons that I originally started down the path of implementing snapshotting was
to implement save/restore for guest during host shutdown
and startup. Because of the way autostart works within libvirt, we can't have
an external script (ala xendomains) do this; it needs to be handled inside the
libvirt daemon itself, and our current save/restore API is not sufficient for
this. That being said, after all of the discussions we have had about
this snapshotting API, I don't think it will be appropriate to shoehorn this "managed"
save/restore into this API, and we'll need a separate API for that.
>
>> 3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots
>> with the same name, differentiated by UUID. Confusingly, they also have a
>> "FindByName" method that returns the first depth-first search snapshot that matches
>> a given name. For qemu, if you specify the same name twice it overwrites the previous
>> one with the new one. I don't know what ESX does here.
>
> ESX 4.0 allows multiple snapshots with the same name. I think this is
> because ESX 4.0 has an integer ID per snapshot. I'm not sure if ESX
> 3.5 allows multiple snapshots with the same name, because the ID field
> was added in ESX 4.0. I assume ESX 3.5 doesn't allow multiple
> snapshots with the same name, but I have currently no ESX 3.5 at hand
> to test.
>
> We could use this integer ID and convert it to UUID format, but you
> won't be able to set the UUID, it'll be read-only and only available
> on ESX 4.0 and above.
Yeah, again, I'm happy to drop UUID and declare duplicate names unsupported
unless there is a good use case.
>
>> Mapping of our interface to various hypervisors:
>> +-------------------------------+-----------------+-------------------+------------------------------+
>> | Libvirt | Qemu | Virtualbox | ESX |
>> +-------------------------------+-----------------+-------------------+------------------------------+
>> | virDomainSnapshotCreateXML | monitor command | takeSnapshot | CreateSnapshot_task |
>> | | "savevm"; if | Snapshots can | takes a name, description, |
>> | | snapshot name | be taken on | memory (true/false) and |
>> | | is already in | powered off, | quiesce (true/false). |
>> | | use, replaces | saved, running, | What does "memory" mean? |
>
> If memory is true, ESX snapshots the memory of the domain too,
> otherwise only a disk snapshot is created.
>
> Creating a disk-only snapshot is nearly instant, while creating a
> memory snapshot also requires a notable amount of time to write the
> memory image to disk.
Sorry, I misread the documentation yesterday. That's fairly clear.
What's less clear to me is what happens when you take a disk-only snapshot,
and then try to RevertToSnapshot from a running VM. What happens in that case?
>
>> | | the previous | or paused VMs. | Should we model "quiesce" |
>
> The vSphere API docs give a good description what the quiesce option does:
>
> "If TRUE and the virtual machine is powered on when the snapshot is
> taken, VMware Tools is used to quiesce the file system in the virtual
> machine. This assures that a disk snapshot represents a consistent
> state of the guest file systems. If the virtual machine is powered off
> or VMware Tools are not available, the quiesce flag is ignored."
>
> I assume "quiesce the file system" means to flush write caches and
> stuff like that.
>
> This option is important if you want to create a disk-only snapshot of
> a running domain.
Exactly. I'm not sure this is going to be possible in general (and
I guess it's not even really possible in ESX unless you install VMware
Tools inside the guest). I'm inclined not to model it at the moment,
although I could be convinced otherwise.
>
>> | | snapshot. Also | The snapshot is | Trees of snapshots are |
>> | | qemu-img | always taken | supported. What happens |
>> | | snapshot -c can | against the | on a duplicate name? What |
>> | | be used to | current snapshot. | state(s) can a VM be in |
>> | | create a | What happens on | when calling this? Does |
>> | | disk-only | a duplicate | a VM get paused when this |
>> | | snapshot. What | name? Trees of | is called? |
>
> In case of ESX the domain can be in any state when a snapshot is created.
>
> If the domain is running when you create a snapshot then the domain is
> _not_ paused during the snapshot creation.
>
> I tested it and the memory snapshot represents the state at the time
> the snapshot command was issued.
OK, great. I'll update these notes about that.
>
>> | | happens if the | snapshots are | |
>> | | VM is running | not currently | |
>> | | when you do | supported. | |
>> | | this? Trees of | Taking a snapshot | |
>> | | snapshots seem | of a running VM | |
>> | | to be supported | pauses the VM | |
>> | | VM gets paused | before taking the | |
>> | | while this is | snapshot. | |
>> | | happening. What | | |
>> | | states can the | | |
>> | | VM be in? | | |
>> +-------------------------------+-----------------+-------------------+------------------------------+
>
>
>> +-------------------------------+-----------------+-------------------+------------------------------+
>> | virDomainSnapshotDelete | monitor command | deleteSnapshot | RemoveSnapshot_Task |
>> | | "delvm". What | deletes the | removes this snapshot and |
>> | | happens if the | specified | deletes any associated |
>> | | snapshot is in | snapshot. Takes | storage. Operates on a |
>> | | use? What | an ID. The VM | VirtualMachineSnapshot |
>> | | states can the | must be off. | object. What states can |
>> | | VM be in? Also | Differences to | the VM be in? What |
>> | | qemu-img | children | happens if this snapshot |
>> | | snapshot -d | snapshots will be | is in-use? What happens |
>> | | <name> <file> | merged with the | to parents and children? |
>> | | command can be | children to keep | |
>> | | used. What | children valid. | |
>> | | happens if the | Parent for this | |
>> | | disk is in-use? | snapshot will | |
>> | | What happens to | become parent of | |
>> | | parents and | any children | |
>> | | children? | snapshots. | |
>> | | How do we | | |
>> | | handle merges? | | |
>> +-------------------------------+-----------------+-------------------+------------------------------+
>
> The domain can be in any state when deleting a snapshot, even if you
> delete the current snapshot. VMware has some documentation about how a
> snapshot is merged into its parent:
>
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002836
>
> And some more general docs about snapshots:
>
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180
>
> Regarding what get's merged and where, I should define the terms I'm
> using first.
>
> A <--1-- B <--2-- C <--3-- current
> + <--4-- D
>
> I intentionally draw the arrows directed from child to parent.
>
> A, B, C, D are what I call a snapshot, a point in "time" I can switch
> to. The disk differences between these points are stored in COW sparse
> images, here shown as 1, 2, 3, 4. The current state of the domain is
> denoted by the "current" item.
>
> Each snapshot is associated with a disk image: A is associated with
> the base image, B with sparse image 1, C with 2 and so on. A special
> case is sparse image 3, it's not associated with a snapshot, but with
> the current state. Also each snapshot can be associated with a memory
> image (not shown here).
>
> The current snapshot in this case is C. If the domain writes changes
> to disk, these changes get stored in sparse image 3. If you switch to
> another snapshot from here then the changes in 3 are lost, because you
> cannot go back to a point where you could access the changes in 3
> again.
>
> Now lets delete B. In this case the memory images associated with B is
> just discarded and 1 and 2 are merged into 5. That's what I was
> referring to when I said ESX merges snapshots into the parent.
>
> A <------5------- C <--3-- current
> + <--4-- D
>
> But this only happens for snapshots like B, that have a parent and a
> child (C is such a snapshot too, even if its child isn't an actual
> snapshot). If you delete D in this example, then the changes in sparse
> image 4 are discarded, because there is no place where they could be
> merged. Merging 4 in the base image would alter A, merging 4 and 5
> would alter C.
>
> Now as I think of this in detail, it seems that the term "merging into
> the parent" is wrong.
>
> In the next example we have snapshot E with parent B.
>
> A <--1-- B <--2-- C <--3-- current
> + <--6-- E
>
> Now what's going to happen if we delete B? In order to preserve C and
> E, the changes in 1 need to be merged into 2 and 6, this results in 1
> + 2 = 5 and 1 + 6 = 7. Or rephrased: B is merged in toits children C
> and E.
>
> A <------5------- C <--3-- current
> + <------7------- E
>
> So, virDomainSnapshotDelete's semantic for VirtualBox and ESX seems to
> be the same. I just used the wrong words to describe it at first.
> Sorry for that.
OK, that's very interesting to know. So VirtualBox and ESX seem to do the
same thing here. This is the last thing I have to do testing on with qemu
to get it's semantic; I'll get to that today, and then we can look again
at the semantics of the flags to virDomainSnapshotDelete.
--
Chris Lalancette
More information about the libvir-list
mailing list