[libvirt] [RFC] externall (pull) backup API

Tue Nov 14 15:38:52 UTC 2017

  Table of contents.

  I  Preface

  1. Fleece API
  2. Export API
  3. Incremental backups
  4. Other hypervisors

  II Links

  I Preface

This is a RFC for external (or pull) backup API in libvirt. There was a series [1]
with more limited API scope and functionality for this kind of backup API.
Besides other issues the series was abandoned as qemu blockdev-del command has
experimental status at that time. There is also a long pending RFC series for
internal (or push) backup API [2] which however has not much in comman with
this RFC. Also there is RFC with overall agreement to having a backup API in
libvirt [3].

The aim of external backup API is to provide means for 3d party application to
read/write domain disks as block devices for the purpuse of backup. Disk is
read on backup operation and in case of active domain is presented at some
point in time (preferable in some guest consistent state). Disk is written on
restore operation.

As to providing disk state at some point in time one can use existing disks
snapshots for this purpose. However this RFC introduces API to leverage image
fleecing (blockdev-backup command) instead. Image fleecing is somewhat inverse
to snapshots. In case of snapshots writes go to top image thus backing image
stays constant, in case of fleecing writes go to same image as before but old
data is previously popped out to fleece image which have original image as
backing. As a result fleece image became disk snapshot.

Another task of this API is to provide disks for read/write operations. One
could try to leverage libvirt stream API for this purpose but AFAIK clients
want random access to disks data which is not what stream API suitable for.
I'm not sure what is costs of adding block API to libvirt, particularly what it
costs to make it effective implementation at RPC level thus this RFC add means
to export disks data thru existing block interfaces. For qemu it is NBD.

  1. Fleece API

So the below API is to provide means to start/stop/query disk image fleecing.
I use BlockSnaphost name for this operation. Other options are Fleecing, BlockFleecing,
TempBlockSnapshot etc.

/* Start fleecing */
virDomainBlockSnapshotPtr
virDomainBlockSnapshotCreateXML(virDomainPtr domain,
                                const char *xmlDesc,
                                unsigned int flags);

/* Stop fleecing */
int
virDomainBlockSnapshotDelete(virDomainBlockSnapshotPtr snapshot,
                             unsigned int flags);

/* List active fleecings */
virDomainBlockSnapshotList(virDomainPtr domain,
                           virDomainBlockSnapshotPtr **snaps,
                           unsigned int flags);

/* Get fleecing description */
char*
virDomainBlockSnapshotGetXMLDesc(virDomainBlockSnapshotPtr snapshot,
                                 unsigned int flags);

/* Get fleecing by name */
virDomainBlockSnapshotPtr
virDomainBlockSnapshotLookupByName(virDomainPtr domain,
                                   const char *name);

Here is a minimal block snapshot xml description to feed creating function:

<domainblocksnapshot>
  <snapshot disk='sda'>
    <fleece file="/path/to/fleece-image-sda"/>
  </snapshot>
  <snapshot disk='sdb'>
    <fleece file="/path/to/fleece-image-sdb"/>
  </snapshot>
</domainblocksnapshot>

Below is an example of what getting description function should provide upon
successful block snaphost creation. The difference with the above xml is that
name element (it can be specified on creation as well) and aliases are
generated. Aliases will be useful later to identify block devices on exporting
thru nbd.

<domainblocksnapshot>
  <name>5768a388-c1c4-414c-ac4e-eab216ba7c0c</name>
  <snapshot disk='sda'>
    <fleece file="/path/to/fleece-image-sda"/>
    <alias name="scsi0-0-0-0-backup"/>
  </snapshot>
  <snapshot disk='sdb'>
    <fleece file="/path/to/fleece-image-sdb"/>
    <alias name="scsi0-0-0-1-backup"/>
  </snapshot>
</domainblocksnapshot>

  2. Export API

During backup operation we need to provide read access to fleecing image. This
is done thru qemu process nbd server. We just need to specify the disks to
export.

/* start block export */
int
virDomainBlockExportStart(virDomainPtr domain,
                          const char *xmlDesc,
                          unsigned int flags);

/* stop block export */
int
virDomainBlockExportStop(virDomainPtr domain,
                         const char *diskName,
                         unsigned int flags);

Here is an example of xml for starting function:

<blockexport type="nbd" port="8001">
  <listen type="address" address="10.0.2.10"/>
  <disk name="scsi0-0-0-1-backup"/>
</blockexport>

qemu nbd server is started upon first disk export start and shutted down upon
last disk export stop. Another option is to control ndb server explicitly. One
way to do it is to consider ndb server a new device so to start/stop/update ndb
server we can use attach/detach/update device functions. Then in block export
start we need to refer to this device somehow. This can be a generated
name/uuid or type/address pair. Actually this approach to expose ndb server
looks more natural to me even it includes more management from client side.
I am not suggesting it in the first place mostly due to hesitations on how to
refer to ndb server on block export.

In any case I'd like to provide export info in active domain config:

<devices>
  <blockexport type="nbd" port="8001">
    <listen type="address" address="10.0.2.10"/>
    <disk name="scsi0-0-0-1-backup"/>
    <disk name="scsi0-0-0-2-backup"/>
  </blockexport>
</devices>

This API is used in restore operation too. Domain is started in paused state,
the disks to be restored are exported and backup client fills it with the
backup data.

  3. Incremental backups

Qemu can track what disk parts are changed from from fleecing start. This is
what typically called CBT (dirty bitmap in qemu community I guess). There are
also experimental ndb support [4] and a bunch of merged/agreed/proposed bitmap
operation that help to organize incremental backups.

Different hypervisors has different bitmap implementations with different
costs thus it is up to hyperivsor whether to start CBT or not upon block snapshot
create by default. Qemu implementations has memory and disk costs for every
bitmap thus I suggest by default start fleecing without bitmap and add flag
VIR_DOMAIN_BLOCK_SNAPSHOT_CREATE_CHECKPOINT to ask to start a bitmap.

Disks bitmaps are visible in active domain definition with the name
of block snapshot for which bitmap was started.

<disk type='file' device='disk'>
  ..
  <target dev='sda' bus='scsi'/>
  <alias name='scsi0-0-0-0'/>
  <checkpoint name="93a5c045-6457-2c09-e56c-927cdf34e178">
  <checkpoint name="5768a388-c1c4-414c-ac4e-eab216ba7c0c">
  ..
</disk>

The bitmap can be specified upon disk export like below (I guess there
is no need to provide more then one bitmap per disk). Active domain
config section for block export is expanded similarly.

<blockexport type="nbd" port="8001">
  <listen type="address" address="10.0.2.10"/>
  <disk name="scsi0-0-0-1-backup" checkpoint="5768a388-c1c4-414c-ac4e-eab216ba7c0c"/>
</blockexport>

If bitmap was created on backup start but client failed to make a backup for some reason
then it makes no sense to keep this checkpoint anymore. As having bitmap takes
resources it is convinient to drop bitmap in this case. Also one may
want to drop bitmap for pure resource managment issues. So we need API to remove bitmap:

virDomainBlockCheckpointRemove(virDomainPtr domain,
                               const char *name,
                               unsigned int flags);

  4. Other hypervisors

I took a somewhat considerable look only at vmware backup interface at [5] etc.
Looks like they don't have fleecing like qemu has so for vmware snapshots one
can use usual disks snapshots API. Also there is no nbd interface for snapshots
expectedly thus to deal with vmware snapshot disks one eventually will have to
add block API to libvirt. So the only point this RFC has to vmware backups is
exporting checkpoints in disk xml. The vmware documentation does not say much
about bitmap limitations but I guess they still can provide only a number of
them which can be exposed as suggested for active domain disks.

  II Links:

[1] https://www.redhat.com/archives/libvir-list/2016-September/msg00192.html
[2] https://www.redhat.com/archives/libvir-list/2017-May/msg00379.html
[3] https://www.redhat.com/archives/libvir-list/2016-March/msg00937.html
[4] https://github.com/NetworkBlockDevice/nbd/commit/cfa8ebfc354b2adbdf73b6e6c2520d1b48e43f7a
[5] https://code.vmware.com/doc/preview?id=4076#/doc/vddkBkupVadp.9.3.html#1014717