[PATCH 32/32] kbase: Add document outlining internals of incremental backup in qemu

Eric Blake eblake at redhat.com
Fri Jun 19 15:10:36 UTC 2020


On 6/15/20 12:10 PM, Peter Krempa wrote:
> Outline the basics and how to integrate with externally created
> overlays. Other topics will continue later.
> 
> Signed-off-by: Peter Krempa <pkrempa at redhat.com>
> ---
>   docs/kbase.html.in                        |   3 +
>   docs/kbase/incrementalbackupinternals.rst | 210 ++++++++++++++++++++++
>   2 files changed, 213 insertions(+)
>   create mode 100644 docs/kbase/incrementalbackupinternals.rst
> 

> +++ b/docs/kbase/incrementalbackupinternals.rst
> @@ -0,0 +1,210 @@
> +================================================
> +Internals of incremental backup handling in qemu
> +================================================
> +
> +.. contents::
> +
> +Libvirt's implementation of incremental backups in the ``qemu`` driver uses
> +qemu's ``block-dirty-bitmaps`` under the hood to track the guest visible disk
> +state changes correspoiding to the points in time described by a libvirt

corresponding

> +checkpoint.
> +
> +There are some semantical implications how libvirt creates and manages the

semantic implications with how

> +bitmaps which de-facto become API as they are written into the disk images and

images,

> +this document will try to sumarize them.

summarize

> +
> +Glossary
> +========
> +
> +Checkpoint
> +
> +    A libvirt object which represents a named point in time of the life of the
> +    vm where libvirt tracks writes the VM has done and allows then a backup of

has done, thereby allowing a backup of only the blocks which changed

> +    block which changed. Note that state of the VM memory is _not_ captured.
> +
> +    A checkpoint can be created either explicitly via the corresponding API
> +    which isn't very useful or is created as part of creating an
> +    incremental or full backup of the VM using the ``virDomainBackupBegin`` API
> +    which allows a next backup to only copy the differences.

Maybe:

A checkpoint can be created either explicitly via the corresponding API 
(although this isn't very useful on its own), or simultaneously with an 
incremental or full backup of the VM

> +
> +Backup
> +
> +    A copy of either all blocks of selected disks (full backup) or blocks changed
> +    since a checkpoint (incremental backup) at the time the backup job was
> +    started. (Blocks modified while the backup job is running are not part of the
> +    backup!)
> +
> +Snapshot
> +
> +    Similarly to a checkpoint it's a point in time in the lifecycle of the VM
> +    but the state of the VM including memory is captured at that point allowing
> +    returning to the state later.

Hmm. We have disk-only snapshots which do not save the state of memory. 
Does this paragraph need adjustment to mention the difference between a 
disk-only snapshot and a full state capture?  Are we redefining any of 
the terms in domainstatecapture.rst, and/or should those two documents 
have cross-references?

> +
> +Blockjob
> +
> +    A long running job which modifies the shape and/or location of the disk
> +    backing chain (images storing the disk contents). Libvirt supports

If qemu adds block-dirty-bitmap-populate, blockjobs can also manipulate 
just bitmaps.

> +    ``block pull`` where data is moved up the chain towards the active layer,
> +    ``block commit`` where data is moved down the chain towards the base/oldest
> +    image. These blockjobs always remove images from the backing chain. Lastly
> +    ``block copy`` where image is moved to a different location (and possibly
> +    collapsed moving all of the data into the new location into the one image).
> +
> +block-dirty-bitmap (bitmap)
> +
> +    A data structure in qemu tracking which blocks were written by the guest
> +    OS since the bitmap was created.
> +
> +Relationships of bitmaps, checkpoints and VM disks
> +==================================================
> +
> +When a checkpoint is created libvirt creates a block-dirty-bitmap for every
> +configured VM disk named the same way as chcheckpoint. The bitmap is actively

s/chcheckpoint/the checkpoint/

> +recording which blocks were changed by the guest OS from that point on. Other
> +bitmaps are not impacted by any way as they are self-contained:
> +
> +::
> +
> + +----------------+       +----------------+
> + | disk: vda      |       | disk: vdb      |
> + +--------+-------+       +--------+-------+
> +          |                        |
> + +--------v-------+       +--------v-------+
> + | vda-1.qcow2    |       | vdb-1.qcow2    |
> + |                |       |                |
> + | bitmaps: chk-a |       | bitmaps: chk-a |
> + |          chk-b |       |          chk-b |
> + |                |       |                |
> + +----------------+       +----------------+
> +
> +Bitmaps are created at the same time to track changes to all disks in sync and
> +are active and persisted in the QCOW2 image. Oter formats currently don't

Other

> +support this feature.
> +
> +Modification of bitmaps outside of libvirt is not recommended, but when adrering

adhering

> +to the same semantics which the document will describe it should be safe to do
> +so but obviously we can't guarantee that.

do so, even if we obviously can't guarantee that

> +
> +
> +Integration with external snapshots
> +===================================
> +
> +Handling of bitmaps
> +-------------------
> +
> +Creating an external snapshot involves adding a new layer to the backing chain
> +on top of the previous chain. In this step there are no new bitmaps created by
> +default, which would mean that backups become impossible after this step.
> +
> +To prevent this from happening we need to re-create the active bitmaps in the
> +new top/active layer of the backing chain which allows us to continue tracking
> +the changes with same granularity as before and also allows libvirt to stitch
> +together all the corresponding bitmaps to do a backup acorss snapshots.

across

> +
> +After taking a snapshot of the ``vda`` disk from the example above placed into
> +``vda-2.qcow2`` the following topology will be created:
> +
> +::
> +
> +   +----------------+
> +   | disk: vda      |
> +   +-------+--------+
> +           |
> +   +-------v--------+    +----------------+
> +   | vda-2.qcow2    |    | vda-1.qcow2    |
> +   |                |    |                |
> +   | bitmaps: chk-a +----> bitmaps: chk-a |
> +   |          chk-b |    |          chk-b |
> +   |                |    |                |
> +   +----------------+    +----------------+
> +
> +Checking bitmap health
> +----------------------
> +
> +QEMU optimizes disk writes by only updating the bitmaps in certain cases. This
> +also can cause problems in cases when e.g. QEMU crashes.
> +
> +For a chain of bitmaps corresponding in a backing chain to be considered valid

corresponding bitmaps

> +and eligible for use with ``virDomainBackupBegin`` it must conform to the
> +following rules:
> +
> +1) Top image must contain the bitmap
> +2) If any of the backing images in the chain contain the bitmap too all

too,

> +   contiguous images must have the bitmap (no gaps)
> +3) all of the above bitmaps must be marked as active
> +   (``auto`` flag in ``qemu-img`` output, ``recording`` in qemu)
> +4) none of the above bitmaps can be inconsistent
> +   (``in-use`` flag in ``qemu-img`` provided that it's not used on image which
> +   is currently in use by a qemu instance, or ``inconsistent`` in qemu)
> +
> +::
> +
> + # check that image has bitmaps
> +  $ qemu-img info vda-1.qcow2
> +   image: vda-1.qcow2
> +   file format: qcow2
> +   virtual size: 100 MiB (104857600 bytes)
> +   disk size: 220 KiB
> +   cluster_size: 65536
> +   Format specific information:
> +       compat: 1.1
> +       compression type: zlib
> +       lazy refcounts: false
> +       bitmaps:
> +           [0]:
> +               flags:
> +                   [0]: in-use
> +                   [1]: auto
> +               name: chk-a
> +               granularity: 65536
> +           [1]:
> +               flags:
> +                   [0]: auto
> +               name: chk-b
> +               granularity: 65536
> +       refcount bits: 16
> +       corrupt: false
> +
> +(See also the ``qemuBlockBitmapChainIsValid`` helper method in
> +``src/qemu/qemu_block.c``)
> +
> +Creating external checkpoints manually

s/checkpoints/snapshots/

> +--------------------------------------
> +
> +To create the same topology outside of libvirt (e.g when doing snapshots offline)
> +a new ``qemu-img`` which supports the ``bitmap`` subcomand is necessary. The

subcommand

s/necessary/recommended/ (as it is also possible to use 'qemu-kvm -S' to 
do the same actions via QMP commands - although I'm not sure if it is 
worth documenting that fallback)

> +following algorithm then ensures that the new image after snapshot will work
> +with backups (note that ``jq`` is a JSON processor):
> +
> +::
> +
> +  # arguments
> +  SNAP_IMG="vda-2.qcow2"
> +  BACKING_IMG="vda-1.qcow2"
> +
> +  # constants - snapshots and bitmaps work only with qcow2
> +  SNAP_FMT="qcow2"
> +  BACKING_IMG_FMT="qcow2"
> +
> +  # create snapshot overlay
> +  qemu-img create -f "$SNAP_FMT" -F "$BACKING_IMG_FMT" -b "$BACKING_IMG" "$SNAP_IMG"
> +
> +  BACKING_IMG_INFO=$(qemu-img info --output=json -f "$BACKING_IMG_FMT" "$BACKING_IMG")
> +  BACKING_BITMAPS=$(jq '."format-specific".data.bitmaps' <<< "$BACKING_IMG_INFO")

<<< is a bashism.

> +
> +  if [ "x$BACKING_BITMAPS" == "xnull" ]; then

So is == instead of =.  Either we should tweak this to be portable to 
dash, or you should add a #!/bin/bash line to the top of the example.

> +      exit 0
> +  fi
> +
> +  for BACKING_BITMAP_ in $(jq -c '.[]' <<< "$BACKING_BITMAPS"); do
> +      BITMAP_FLAGS=$(jq -c -r '.flags[]' <<< "$BACKING_BITMAP_")
> +      BITMAP_NAME=$(jq -r '.name' <<< "$BACKING_BITMAP_")
> +
> +      if grep 'in-use' <<< "$BITMAP_FLAGS" ||
> +         grep -v 'auto' <<< "$BITMAP_FLAGS"; then
> +         continue
> +      fi
> +
> +      qemu-img bitmap -f "$SNAP_FMT" "$SNAP_IMG" --add "$BITMAP_NAME"

Do you want to also copy the --granularity of the bitmaps being added?

> +
> +  done
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




More information about the libvir-list mailing list