[libvirt] [RFC]: Snapshot API v3

Chris Lalancette clalance at redhat.com
Wed Mar 31 17:16:44 UTC 2010


On 03/30/2010 08:14 PM, Matthias Bolte wrote:
> 2010/3/30 Chris Lalancette <clalance at redhat.com>:
>> Hello,
>>     After our discussions about the snapshot API last week, I went ahead and implemented
>> quite a bit of the API.  I also went back to the ESX, Virtualbox, and QEMU API's to
>> try and make sure our API's matched up.  What's below is my revised API based on
>> that survey.  Following my revised API are notes that I took regarding how the
>> libvirt API matches up to the various API's, and some questions about semantics that
>> I had while doing the survey.  More comments and questions are welcome.
> 
>> /* Start the guest from the snapshot "snapshot" */
>> int virDomainCreateFromSnapshot(virDomainSnapshotPtr snapshot,
>>                                unsigned int flags);
> 
> Will it be enforced that the domain is shutdown in order to call this function?
> 
> ESX doesn't have such a restriction. Not sure about other hypervisors.

Heh, I was just going through that myself.  No, it's not required to be
shutdown in general; qemu supports both modes.  I've updated the documentation
for this call.

<snip>

>>  * Note that if other snapshots would be discarded because of this
>>  * MERGE action, this operation will fail.  If that is really what is intended,
>>  * use MERGE_FORCE.
>>  *
>>  * With a DISCARD flag, it deletes the snapshot.  Note that if children snapshots
>>  * would be discarded because of this delete action, this operation will
>>  * fail.  If this is really what is intended, use DISCARD_FORCE.
>>  *
>>  * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive.
>>  *
>>  * Note that this operation can happen when the domain is running or shut
>>  * down, though this is hypervisor specific */
>> typedef enum {
>>    VIR_DOMAIN_SNAPSHOT_DELETE_MERGE,
>>    VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE,
>>    VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD,
>>    VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE,
>> } virDomainSnapshotDelete;
>> int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot,
>>                            unsigned int flags);
>>
>> int virDomainSnapshotFree(virDomainSnapshotPtr snapshot);
>>
>> NOTE: During snapshot creation, *none* of the fields are required.  That is,
>> you can call virDomainSnapshotCreateXML() with an XML of "<domainsnapshot/>".
>> In this case, the individual driver will make up a <name> and <uuid> for you,
> 
> Does <uuid> here refer to a snapshot UUID? As said before, there is no
> easy way have a UUID per snapshot with ESX. Well, we could store
> <uuid>:<name> in the name field on the ESX side, but that's not a
> really good way to do it.

Yeah, agreed, that was a leftover I forgot to edit out.  See my reply to
Jiri Denemark, but essentially I'm content to declare duplicate names
unsupported/undefined, and not deal with UUID's at all.  I've removed
mention of UUID's from the documentation now.

<snip>

>> The virsh commands will be:
>> virsh snapshot-create <dom> <xmlfile>
>> virsh snapshot-list <dom>
>> virsh snapshot-dumpxml <dom> <name>
>> virsh start-with-snapshot <dom> <snapshotname>
>> virsh snapshot-delete <dom> <snapshotname> [--merge|--mergeforce|--delete|--deleteforce]
>> virsh snapshot-delete-all <dom>
>>
>> Possible issues:
>> 1)  I don't see a way to support "managed" save/restore and snapshotting with
>> this API.  I think we'll have to have a separate API for managed save/restore.
> 
> What's "managed" save/restore and snapshotting?

Oops, yeah, that's a personal note that I didn't really expound upon.  One of the
reasons that I originally started down the path of implementing snapshotting was
to implement save/restore for guest during host shutdown
and startup.  Because of the way autostart works within libvirt, we can't have
an external script (ala xendomains) do this; it needs to be handled inside the
libvirt daemon itself, and our current save/restore API is not sufficient for
this.  That being said, after all of the discussions we have had about
this snapshotting API, I don't think it will be appropriate to shoehorn this "managed"
save/restore into this API, and we'll need a separate API for that.

> 
>> 3)  Do we need a snapshot UUID?  Virtualbox allows you to have multiple snapshots
>> with the same name, differentiated by UUID.  Confusingly, they also have a
>> "FindByName" method that returns the first depth-first search snapshot that matches
>> a given name.  For qemu, if you specify the same name twice it overwrites the previous
>> one with the new one.  I don't know what ESX does here.
> 
> ESX 4.0 allows multiple snapshots with the same name. I think this is
> because ESX 4.0 has an integer ID per snapshot. I'm not sure if ESX
> 3.5 allows multiple snapshots with the same name, because the ID field
> was added in ESX 4.0. I assume ESX 3.5 doesn't allow multiple
> snapshots with the same name, but I have currently no ESX 3.5 at hand
> to test.
> 
> We could use this integer ID and convert it to UUID format, but you
> won't be able to set the UUID, it'll be read-only and only available
> on ESX 4.0 and above.

Yeah, again, I'm happy to drop UUID and declare duplicate names unsupported
unless there is a good use case.

> 
>> Mapping of our interface to various hypervisors:
>> +-------------------------------+-----------------+-------------------+------------------------------+
>> | Libvirt                       | Qemu            | Virtualbox        | ESX                          |
>> +-------------------------------+-----------------+-------------------+------------------------------+
>> | virDomainSnapshotCreateXML    | monitor command | takeSnapshot      | CreateSnapshot_task          |
>> |                               | "savevm"; if    | Snapshots can     | takes a name, description,   |
>> |                               | snapshot name   | be taken on       | memory (true/false) and      |
>> |                               | is already in   | powered off,      | quiesce (true/false).        |
>> |                               | use, replaces   | saved, running,   | What does "memory" mean?     |
> 
> If memory is true, ESX snapshots the memory of the domain too,
> otherwise only a disk snapshot is created.
> 
> Creating a disk-only snapshot is nearly instant, while creating a
> memory snapshot also requires a notable amount of time to write the
> memory image to disk.

Sorry, I misread the documentation yesterday.  That's fairly clear.
What's less clear to me is what happens when you take a disk-only snapshot,
and then try to RevertToSnapshot from a running VM.  What happens in that case?

> 
>> |                               | the previous    | or paused VMs.    | Should we model "quiesce"    |
> 
> The vSphere API docs give a good description what the quiesce option does:
> 
> "If TRUE and the virtual machine is powered on when the snapshot is
> taken, VMware Tools is used to quiesce the file system in the virtual
> machine. This assures that a disk snapshot represents a consistent
> state of the guest file systems. If the virtual machine is powered off
> or VMware Tools are not available, the quiesce flag is ignored."
> 
> I assume "quiesce the file system" means to flush write caches and
> stuff like that.
> 
> This option is important if you want to create a disk-only snapshot of
> a running domain.

Exactly.  I'm not sure this is going to be possible in general (and
I guess it's not even really possible in ESX unless you install VMware
Tools inside the guest).  I'm inclined not to model it at the moment,
although I could be convinced otherwise.

> 
>> |                               | snapshot.  Also | The snapshot is   | Trees of snapshots are       |
>> |                               | qemu-img        | always taken      | supported.  What happens     |
>> |                               | snapshot -c can | against the       | on a duplicate name? What    |
>> |                               | be used to      | current snapshot. | state(s) can a VM be in      |
>> |                               | create a        | What happens on   | when calling this?  Does     |
>> |                               | disk-only       | a duplicate       | a VM get paused when this    |
>> |                               | snapshot.  What | name?  Trees of   | is called?                   |
> 
> In case of ESX the domain can be in any state when a snapshot is created.
> 
> If the domain is running when you create a snapshot then the domain is
> _not_ paused during the snapshot creation.
> 
> I tested it and the memory snapshot represents the state at the time
> the snapshot command was issued.

OK, great.  I'll update these notes about that.

> 
>> |                               | happens if the  | snapshots are     |                              |
>> |                               | VM is running   | not currently     |                              |
>> |                               | when you do     | supported.        |                              |
>> |                               | this?  Trees of | Taking a snapshot |                              |
>> |                               | snapshots seem  | of a running VM   |                              |
>> |                               | to be supported | pauses the VM     |                              |
>> |                               | VM gets paused  | before taking the |                              |
>> |                               | while this is   | snapshot.         |                              |
>> |                               | happening. What |                   |                              |
>> |                               | states can the  |                   |                              |
>> |                               | VM be in?       |                   |                              |
>> +-------------------------------+-----------------+-------------------+------------------------------+
> 
> 
>> +-------------------------------+-----------------+-------------------+------------------------------+
>> | virDomainSnapshotDelete       | monitor command | deleteSnapshot    | RemoveSnapshot_Task          |
>> |                               | "delvm".  What  | deletes the       | removes this snapshot and    |
>> |                               | happens if the  | specified         | deletes any associated       |
>> |                               | snapshot is in  | snapshot.  Takes  | storage.  Operates on a      |
>> |                               | use?  What      | an ID.  The VM    | VirtualMachineSnapshot       |
>> |                               | states can the  | must be off.      | object.  What states can     |
>> |                               | VM be in?  Also | Differences to    | the VM be in?  What          |
>> |                               | qemu-img        | children          | happens if this snapshot     |
>> |                               | snapshot -d     | snapshots will be | is in-use?  What happens     |
>> |                               | <name> <file>   | merged with the   | to parents and children?     |
>> |                               | command can be  | children to keep  |                              |
>> |                               | used.  What     | children valid.   |                              |
>> |                               | happens if the  | Parent for this   |                              |
>> |                               | disk is in-use? | snapshot will     |                              |
>> |                               | What happens to | become parent of  |                              |
>> |                               | parents and     | any children      |                              |
>> |                               | children?       | snapshots.        |                              |
>> |                               | How do we       |                   |                              |
>> |                               | handle merges?  |                   |                              |
>> +-------------------------------+-----------------+-------------------+------------------------------+
> 
> The domain can be in any state when deleting a snapshot, even if you
> delete the current snapshot. VMware has some documentation about how a
> snapshot is merged into its parent:
> 
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002836
> 
> And some more general docs about snapshots:
> 
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180
> 
> Regarding what get's merged and where, I should define the terms I'm
> using first.
> 
> A <--1-- B <--2-- C <--3-- current
> + <--4-- D
> 
> I intentionally draw the arrows directed from child to parent.
> 
> A, B, C, D are what I call a snapshot, a point in "time" I can switch
> to. The disk differences between these points are stored in COW sparse
> images, here shown as 1, 2, 3, 4. The current state of the domain is
> denoted by the "current" item.
> 
> Each snapshot is associated with a disk image: A is associated with
> the base image, B with sparse image 1, C with 2 and so on. A special
> case is sparse image 3, it's not associated with a snapshot, but with
> the current state. Also each snapshot can be associated with a memory
> image (not shown here).
> 
> The current snapshot in this case is C. If the domain writes changes
> to disk, these changes get stored in sparse image 3. If you switch to
> another snapshot from here then the changes in 3 are lost, because you
> cannot go back to a point where you could access the changes in 3
> again.
> 
> Now lets delete B. In this case the memory images associated with B is
> just discarded and 1 and 2 are merged into 5. That's what I was
> referring to when I said ESX merges snapshots into the parent.
> 
> A <------5------- C <--3-- current
> + <--4-- D
> 
> But this only happens for snapshots like B, that have a parent and a
> child (C is such a snapshot too, even if its child isn't an actual
> snapshot). If you delete D in this example, then the changes in sparse
> image 4 are discarded, because there is no place where they could be
> merged. Merging 4 in the base image would alter A, merging 4 and 5
> would alter C.
> 
> Now as I think of this in detail, it seems that the term "merging into
> the parent" is wrong.
> 
> In the next example we have snapshot E with parent B.
> 
> A <--1-- B <--2-- C <--3-- current
>          + <--6-- E
> 
> Now what's going to happen if we delete B? In order to preserve C and
> E, the changes in 1 need to be merged into 2 and 6, this results in 1
> + 2 = 5 and 1 + 6 = 7. Or rephrased: B is merged in toits children C
> and E.
> 
> A <------5------- C <--3-- current
> + <------7------- E
> 
> So, virDomainSnapshotDelete's semantic for VirtualBox and ESX seems to
> be the same. I just used the wrong words to describe it at first.
> Sorry for that.

OK, that's very interesting to know.  So VirtualBox and ESX seem to do the
same thing here.  This is the last thing I have to do testing on with qemu
to get it's semantic; I'll get to that today, and then we can look again
at the semantics of the flags to virDomainSnapshotDelete.

-- 
Chris Lalancette




More information about the libvir-list mailing list