[libvirt] [PATCH v4 0/8] bulk snapshot list/redefine (incremental backup saga)

Eric Blake eblake at redhat.com
Wed Mar 13 00:41:58 UTC 2019


On 3/12/19 6:29 PM, Nir Soffer wrote:
> On Tue, Mar 12, 2019 at 11:52 PM Eric Blake <eblake at redhat.com> wrote:
> 
>> On 3/12/19 4:35 PM, Nir Soffer wrote:
>>
>>>>> We don't have a need to list or define snapshots since we managed
>>>> snapshots
>>>>> on oVirt side.
>>>>> We want an API to list and redefine checkpoints.
>>>>
>>>> But the proposed <domaincheckpoint> also has a <domain> subelement, so
>>>> it has the same problem (the XML for a bulk operation can become
>>>> prohibitively large).
>>>>
>>>
>>> Why do we need <domain> in a checkpoint?
>>
>> The initial design for snapshots did not have <domain>, and it bit us
>> hard; you cannot safely revert to a snapshot if you don't know the state
>> of the domain at that time. The initial design for checkpoints has thus
>> mandated that <domain> be present (my v5 series fails to redefine a
>> snapshot if you omit <domain>, even though it has a NO_DOMAIN flag
>> during dumpxml for reduced output).  If we are convinced that defining a
>> snapshot without <domain> is safe enough, then I can relax the

s/snapshot/checkpoint/ in that line

>> checkpoint code to allow redefined metadata without <domain> the way
>> snapshots already have to do it, even though I was hoping that
>> checkpoints could start life with fewer back-compats that snapshot has
>> had to carry along.  But I'd rather start strict and relax later
>> (require <domain> and then remove it when proven safe), and not start
>> loose (leave <domain> optional, and then wish we had made it mandatory).
>>
> 
> You keep mentioning snapshots, but we are not going to redefine snapshots,
> only checkpoints.
> 
> We need to do this only because qcow2 does not store a chain of bitmaps, but

just as qcow2 does not store a chain of snapshots either.

> un-ordered bitmaps without any metadata, so we need to remind libvirt which
> bitmaps we have in every disk, and what is the order of the bitmaps. We
> will be
> happy if we could just specify the bitmaps and disks in the backup API, and
> avoid
> rebuilding the checkopints history, just like we avoid rebuilding the
> snapshot
> history.

For persistent domains, it would be better if libvirt migrated snapshots
(and therefore checkpoints) automatically, instead of being a coward and
refusing the migration because a snapshot exists. The workaround is that
you dumpxml then undefine snapshots on the source, then migrate, then
createXML(REDEFINE) on the destination using the dumped XML.  Since
checkpoints share a lot of code with snapshots, it was easier to get the
code working by having the same rules (and if we fix it for one, we
should fix it for both, again because they share a lot of code).

For transient domains, libvirt has no where to store the information
about the snapshots or checkpoints, so it is relying on you to REDEFINE
the SAME state as what it had last time libvirt was running.

>>> There is no domain info related to these checkpoints, and I hope we are
>> not
>>> adding such requirements at this stage.
>>
>> Rather, that requirement has been there in all of my drafts, and it
>> sounds like you are trying to get me to relax that requirement to not
>> need a <domain> during checkpoint metadata redefine.
>>
> 
> We never understood this requirement it in the html documentation built
> from your patches.
> 
> Looking again in v3 - I see:
> 
>     domain
>     The inactive domain configuration at the time the checkpoint was
> created. Readonly.

Copied heavily from snapshots which has the same wording, so I'm going
to improve that wording in the next round of patches.  But the key point
is that:

CreateXML(, 0)

takes abbreviated (or full) XML, ignores readonly fields, _and_
populates remaining fields with information learned AT THE TIME the
object was created.  Thereafter, GetXMLDesc() outputs the full XML, and:

CreateXML(, _REDEFINE)

says to take the FULL XML from a previous dump (and NOT the abbreviated
XML at the initial creation) in order to restore ALL state as libvirt
had (if you omit fields, libvirt has to guess at what state to recreate
for those fields - for some fields, guessing might not be too bad, such
as what timestamp to use. But for other fields, guessing what to use for
<domain> by using the current state of the domain, which may have had
hotplug events in the meantime, is WRONG compared to being told the
exact <domain> state that libvirt previously had in memory and which was
included in the previous dumpxml).

In other words, _REDEFINE has never been about inventing state from
scratch, but about restoring state that libvirt previously had, as
dumped by libvirt.

> 
> And we have an example:
> 
> <domaincheckpoint>
>   <description>Completion of updates after OS install</description>
>   <disks>
>     <disk name='vda' checkpoint='bitmap'/>
>     <disk name='vdb' checkpoint='no'/>
>   </disks>
> </domaincheckpoint>
> 
> Which does not have a domain, since it is read-only.

Yes, that's and example of CreateXML(, 0). But NOT an example of
CreateXML(, _REDEFINE).

> 
> There is an example of the xml we get from libvirt:
> 
> <domaincheckpoint>
>   <name>1525889631</name>
>   <description>Completion of updates after OS install</description>
>   <creationTime>1525889631</creationTime>
>   <parent>
>     <name>1525111885</name>
>   </parent>
>   <disks>
>     <disk name='vda' checkpoint='bitmap' bitmap='1525889631'/>
>     <disk name='vdb' checkpoint='no'/>
>   </disks>
>   <domain type='qemu'>
>     <name>fedora</name>
>     <uuid>93a5c045-6457-2c09-e56c-927cdf34e178</uuid>
>     <memory>1048576</memory>
>     ...
>     <devices>
>       <disk type='file' device='disk'>
>         <driver name='qemu' type='qcow2'/>
>         <source file='/path/to/file1'/>
>         <target dev='vda' bus='virtio'/>
>       </disk>
>       <disk type='file' device='disk' snapshot='external'>
>         <driver name='qemu' type='raw'/>
>         <source file='/path/to/file2'/>
>         <target dev='vdb' bus='virtio'/>
>       </disk>
>       ...
>     </devices>
>   </domain>
> </domaincheckpoint>

And _that's_ the XML that the _REDEFINE flag expects.

> 
> But as the domain is read-only, we assumed that we don't need to pass it in
> the
> domaincheckpoint xml. We also don't need it since we keep the the disks
> uuids
> included in every checkpoint, and from this you can find the information
> needed
> to generate the tiny checkpoint xml above.

But the tiny original input XML is no longer sufficient during
_REDEFINE, because you no longer have the point in time where libvirt
can populate missing/readonly fields with sane defaults.

> 
> Can you explain why historic domain xml is needed for performing a backup?

In the current implementation for qemu, the full <domain> is needed only
for validation that the disks requested in the virBackupBegin()
operation have a clean chain of bitmaps from the checkpoint to the
present; and you may be right that the bulk of the information about the
<domain> is spurious for qemu's needs.  But it is not obvious whether
throwing away information about the <domain> would be valid for other
hypervisor implementations, nor whether some future tweak to qemu may
actually require more of the <domain> state as it was at the time the
snapshot was created.

> 
> The caller of the API is responsible to give you the list of bitmaps that
> should be
> included for every disk. Libvirt needs to create a new active bitmap and
> start the
> NBD server exposing the bitmaps specified by the caller. How having the xml
> for
> every checkpoint/bitmap is needed for perform these steps?

Because having the <domain> lets libvirt correlate that disk 'vda' at
the time of the snapshot maps to whatever filename with UUID in the
<domain> definition, to ensure that it is operating on the same device
(and that 'vda' was not hot-unplugged and then hot-plugged to some other
disk in the meantime).


>>
> 
> The xml libvirt produces may not be unstable. For example the disk uuid
> "22e5a7d6-5409-482d-bda4-f5939b27aad4" may be called now "sda", but
> when the vm was running on another host 2 weeks ago maybe it was called
> "sdb", since the user added or removed some devices.

But at the time a <domainsnapshot> or <domaincheckpoint> was created,
the <domain> subelement of that XML _is_ stable - that particular disk
name/UUID WAS disk 'sda' at the time the object was created, no matter
what name it has later.


>>> Note that vdsm may be killed in the middle of the redefine loop, and in
>>> this case
>>> we leave livbirt with partial info about checkpoints, and we need to
>>> redefine
>>> the checkpoints again handling this partial sate.
>>
>> But that's relatively easy - if you don't know whether libvirt might
>> have partial data, then wipe the data and start the redefine loop from
>> scratch.
>>
> 
> But this easy boilerplate will be duplicated in all libvirt clients,
> instead of
> having single API to restore the state.

If the single API didn't have to include a <domain> element for every
checkpoint, and thus risk exceeding RPC bounds, then we'd be okay. But
at the present moment, bulk operations would require a LOT of XML,
regardless of snapshot vs. checkpoint.

> 
> Of course if we must keep historic domain configuration for every checkpoint
> bulk API does not make sense, but I don't understand this requirement.

The requirement is stronger for snapshot (because you can't revert
without either knowing the <domain> setup to revert to or using the
UNSAFE flag) than for checkpoint; but my plan was to make <domain>
mandatory for checkpoint because experience with checkpoint has shown
that making <domain> optional on REDEFINE causes more grief than
desired, and because I wanted to share as much good code between the two
(while leaving the back-compats to just snapshots) as possible.  If it
turns out that we want <domain> to be optional on REDEFINE for
checkpoints, we can do that as a followup patch, but first I want to
focus on getting the API in (we can relax API restrictions later, but
only if the API even exists to make it into backports).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20190312/fd07671f/attachment-0001.sig>


More information about the libvir-list mailing list