[libvirt] [RFC v2] external (pull) backup API

Nikolay Shirokovskiy nshirokovskiy at virtuozzo.com
Fri Apr 13 12:02:07 UTC 2018



On 13.04.2018 12:07, Daniel P. Berrangé wrote:
> On Tue, Apr 03, 2018 at 03:01:22PM +0300, Nikolay Shirokovskiy wrote:
>> *Temporary snapshot API*
>>
>> In previous version it is called 'Fleece API' after qemu terms and I'll still
>> use BlockSnapshot prefix for commands as in previous RFC instead of
>> TmpSnapshots which I inclined more now.
>>
>> virDomainBlockSnapshotPtr
>> virDomainBlockSnapshotCreateXML(virDomainPtr domain,
>>                                 const char *xmlDesc,
>>                                 unsigned int flags);
>>
>> virDomainBlockSnapshotDelete(virDomainBlockSnapshotPtr snapshot,
>>                              unsigned int flags);
>>
>> virDomainBlockSnapshotList(virDomainPtr domain,
>>                            virDomainBlockSnapshotPtr **snaps,
>>                            unsigned int flags);
>>
>> virDomainBlockSnapshotGetXMLDesc(virDomainBlockSnapshotPtr snapshot,
>>                                  unsigned int flags);
>>
>> virDomainBlockSnapshotPtr
>> virDomainBlockSnapshotLookupByName(virDomainPtr domain,
>>                                    const char *name,
>>                                    unsigned int flags);
>>
>> Here is an example of snapshot xml description:
>>
>> <domainblocksnapshot>
>>     <name>d068765e-8b50-4d74-9b72-1e55c663cbf8</name>
>>     <disk name='sda' type="file">
>>         <fleece file="/tmp/snapshot-a.hdd"/>
> 
> Can we just call this   <source file="....."/>  which is how we name
> things in normal <disk> elements.  'fleece' in particular is an awful
> name giving no indication of what is being talked about unless you've
> already read the QEMU low levels, so I'd rather we don't use the word
> "fleece" anywhere in API or XML or docs at the libvirt level.

It would be easiest thing to do) Let me explain.

"source" in plain external snapshots for example feels awkward to me. It is read like "make a
snapshot of disk sda which source file is like that". IMHO it would be better if xml is read 
like "make a snapshot of disk sda and put it into dest|target file. Then for block snapshot xml would read like "make a snapshot 
of disk sda and put fleece there". Fleece may be a new term but it only costs one-two 
sentences to define it. And it is better to have this definition so that user knows what the nature of this image,
so that user have correct assumptions on image size, lifetime... If fleece word itself unfortunate
then we can coin another one.

Looks like "source" takes root in domain xml where it reads well.

> 
>>     </disk>
>>     <disk name='sdb' type="file">
>>         <fleece file="/tmp/snapshot-b.hdd"/>
>>     </disk>
>> </domainblocksnapshot>
> 
>>
>> Temporary snapshots are indepentent thus they are not organized in tree structure
>> as usual snapshots, so the 'list snapshots' and 'lookup' function will suffice.
>>
>> Qemu can track what disk's blocks are changed from snapshotted state so on next
>> backup client can backup only changed blocks. virDomainBlockSnapshotCreateXML
>> accepts VIR_DOMAIN_BLOCK_SNAPSHOT_CREATE_CHECKPOINT flag to turn this option
>> for snapshot which means to track changes from this particular snapshot. I used
>> checkpoint term and not [dirty] bitmap because many qemu dirty bitmaps are used
>> to provide changed blocks from the given checkpoint to current snapshot in
>> current implementation (see *Implemenation* section for more details). Also
>> bitmap keeps block changes and thus itself changes in time and checkpoint is
>> a more statical terms means you can query changes from that moment in time.
>>
>> Checkpoints are visible in active domain xml:
>>
>>     <disk type='file' device='disk'>
>>       ..
>>       <target dev='sda' bus='scsi'/>
>>       <alias name='scsi0-0-0-0'/>
>>       <checkpoint name="93a5c045-6457-2c09-e56c-927cdf34e178">
>>       <checkpoint name="5768a388-c1c4-414c-ac4e-eab216ba7c0c">
> 
> How are these checkpoints recorded / associated to actual storage
> on disk ? What happens across restarts of the VM if this is only
> in the live XML.
> 

Checkpoints reside in qcow2 image entirely. Internally they
represented as dirty bitmaps with specially constructed names
(name scheme is explained in *checkpoints* subsection of
*implementation details* section).

After VM restart checkpoints are reread from qemu.

Hmm, it strikes me that it is good idea to provide checkpoints info
in domain stats rather then domain xml just like image size etc.

On the other hand disk backing chain is expanded in live xml so
having checkpoints in domain xml not too unexpected...

>>       ..
>>     </disk>
> 
> 
>> *Block export API*
>>
>> I guess it is natural to treat qemu NBD server as a domain device. So
> 
> A domain device is normally something that is related to the guest
> machine ABI. This is entirely invisible to the guest, just a backend
> concept, so this isn't really a natural fit as a domain device.

I have VNC in mind as a precedent.

> 
>> we can use virDomainAttachDeviceFlags/virDomainDetachDeviceFlags API to start/stop NBD
>> server and virDomainUpdateDeviceFlags to add/delete disks to be exported.
>> While I'm have no doubts about start/stop operations using virDomainUpdateDeviceFlags 
>> looks a bit inconvinient so I decided to add a pair of API functions just
>> to add/delete disks to be exported:
>>
>> int
>> virDomainBlockExportStart(virDomainPtr domain,
>>                           const char *xmlDesc,
>>                           unsigned int flags);
>>
>> int
>> virDomainBlockExportStop(virDomainPtr domain,
>>                          const char *xmlDesc,
>>                          unsigned int flags);
>>
>> I guess more appropriate names are virDomainBlockExportAdd and
>> virDomainBlockExportRemove but as I already have a patch series implementing pull
>> backups with these names I would like to keep these names now.
>>
>> These names also reflect that in the implementation I decided to start/stop NBD
>> server in a lazy manner. While it is a bit innovative for libvirt API I guess
>> it is convinient because to refer NBD server to add/remove disks to we need to
>> identify it thru it's parameters like type, address etc until we introduce some
>> device id (which does not looks consistent with current libvirt design). So it
>> looks like we have all parameters to start/stop server in the frame of these
>> calls so why have extra API calls just to start/stop server manually. If we
>> later need to have NBD server without disks we can perfectly support
>> virDomainAttachDeviceFlags/virDomainDetachDeviceFlags.
>>
>> Here is example of xml to add/remove disks (specifying checkpoint
>> attribute is not needed for removing disks of course):
>>
>> <domainblockexport type="nbd">
>>     <address type="ip" host="0.0.0.0" port="8000"/>
>>     <disk name="sda" snapshot="0044757e-1a2d-4c2c-b92f-bb403309bb17"
>>                      checkpoint="d068765e-8b50-4d74-9b72-1e55c663cbf8"/>
>>     <disk name="sdb" snapshot="0044757e-1a2d-4c2c-b92f-bb403309bb17"
>>                      checkpoint="d068765e-8b50-4d74-9b72-1e55c663cbf8"/>
> 
> What do all these UUIDs refer to ?

Sorry for UUIDs instead of human names, my bad.

This xml exports snapshot 0044757e-1a2d-4c2c-b92f-bb403309bb17 of disk sda
and CBT from point in time reffered by checkpoint d068765e-8b50-4d74-9b72-1e55c663cbf8 to
point in time reffered by snapshot being exported (0044757e-1a2d-4c2c-b92f-bb403309bb17).


> 
>> </domainblockexport>
>>
>> And this is how this NBD server will be exposed in domain xml:
>>
>> <devices>
>>     ...
>>     <blockexport type="nbd">
>>         <address type="ip" host="0.0.0.0" port="8000"/>
>>         <disk name="sda" snapshot="0044757e-1a2d-4c2c-b92f-bb403309bb17"
>>                          checkpoint="d068765e-8b50-4d74-9b72-1e55c663cbf8"
>>                          exportname="sda-0044757e-1a2d-4c2c-b92f-bb403309bb17"/>
>>         <disk name="sdb" snapshot="0044757e-1a2d-4c2c-b92f-bb403309bb17"
>>                          checkpoint="d068765e-8b50-4d74-9b72-1e55c663cbf8
>>                          exportname="sdb-0044757e-1a2d-4c2c-b92f-bb403309bb17"/>
>>     </blockexport>
> 
> This feels pretty awkward to me - as mentioned above this is not really
> guest ABI related to having it under <devices> is not a good fit.
> 
> I question whether the NBD server address should be exposed in the XML
> at all. This is a transient thing that is started/stopped on demand via

Such xml resembles VNC/serial ports to me. These two are not guest ABI.
On the other hand they connected to guest devices and nbd server is not ...

> the APIs you show above. So I'd suggest we just have an API to query
> the listening address of the NBD server.

Such API looks like having very little function to have it...

> 
> At most in the XML we could have a element under each respective
> existing <disk/> element to say whether it is exported or not, instead
> of adding new <disk/> elements in a separate place.
> 

Just as in case of graphical framebuffer I thought we can have multiple
NBD servers (qemu limited to just one now). So if we put export info
under disks we need to refer to NBD server which is basically specifying
its address. So having xml with NBD servers each providing info on
what it exports looks more simple.

Nikolay




More information about the libvir-list mailing list