[libvirt-users] backup procedure using blockcopy

Mon Mar 18 11:39:09 UTC 2013

The 15/03/13, Eric Blake wrote:
> On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:

> > Here are the basics steps. This is still not that simple and there are
> > tricky parts in the way.
> > 
> > Usual workflow (use case 2)
> > ===========================
> > 
> > Step 1: create external snapshot for all VM disks (includes VM state).
> > Step 2: do the backups manually while the VM is still running (original disks and memory state).
> > Step 3: save and halt the vm state once backups are finished.
> > Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file.
> > Step 5: start the VM.
> 
> This involves guest downtime, longer according to how much state changed
> since the snapshot.

Right.

> > Restarting from the backup (use case 1)
> > =======================================
> > 
> > Step A: shutdown the running VM and move it out the way.
> > Step B: restore the backing files and state file from the archives of step 2.
> > Step C: restore the VM. (still not sure on that one, see below)
> > 
> > I wish to provide a more detailed procedure in the future.
> > 
> > 
> >> With new enough libvirt and qemu, it is also possible to use 'virsh
> >> blockcopy' instead of snapshots as a backup mechanism, and THAT works
> >> with raw images without forcing your VM to use qcow2.  But right now, it
> >> only works with transient guests (getting it to work for persistent
> >> guests requires a persistent bitmap feature that has been proposed for
> >> qemu 1.5, along with more libvirt work to take advantage of persistent
> >> bitmaps).
> > 
> > Fine. Sadly, my guests are not transient.
> 
> Guests can be made temporarily transient.  That is, the following
> sequence has absolute minimal guest downtime, and can be done without
> any qcow2 files in the mix.  For a guest with a single disk, there is
> ZERO! downtime:
> 
> virsh dumpxml --security-info dom > dom.xml
> virsh undefine dom
> virsh blockcopy dom vda /path/to/backup --wait --verbose --finish
> virsh define dom.xml
> 
> For a guest with multiple disks, the downtime can be sub-second, if you
> script things correctly (the downtime lasts for the duration between the
> suspend and resume, but the steps done in that time are all fast):
> 
> virsh dumpxml --security-info dom > dom.xml
> virsh undefine dom
> virsh blockcopy dom vda /path/to/backup-vda
> virsh blockcopy dom vdb /path/to/backup-vdb
> polling loop - check periodically until 'virsh blockjob dom vda' and
> 'virsh blockjob dom vdb' both show 100% completion
> virsh suspend dom
> virsh blockjob dom vda --abort
> virsh blockjob dom vdb --abort
> virsh resume dom
> virsh define dom.xml
> 
> In other words, 'blockcopy' is my current preferred method of online
> guest backup, even though I'm still waiting for qemu improvements to
> make it even nicer.

As I understand the man-page, blockcopy (without --shallow) creates a
new disk file of a disk by merging all the current files if there are
more than one.

Unless --finish/--pivot is passed to blockcopy or until
--abort/--pivot/--async is passed to blockjob, the original disks
(before blockcopy started) and the new disk created by blockcopy are
both mirrored.

Only --pivot makes use of the new disk. So with --finish or --abort, we
get a backup of a running guest. Nice! Except maybe that the backup
doesn't include the memory state.

In order to include the memory state to the backup, I guess the
pause/resume is inevitable:

  virsh dumpxml --security-info dom > dom.xml
  virsh undefine dom
  virsh blockcopy dom vda /path/to/backup-vda
  polling loop - check periodically until 'virsh blockjob dom vda'
  shows 100% completion
  virsh suspend dom
  virsh save dom /path/to/memory-backup --running
  virsh blockjob dom vda --abort
  virsh resume dom
  virsh define dom.xml

I'd say that the man page miss the information that these commands can
run with a running guest, dispite the mirroring feature might imply it.

I would also add a "sync" command just after the first command as a
safety mesure to ensure the xml is kept on disk.

The main drawback I can see is that the hypervisor must have at least as
free disk space than the disks to backup... Or have the path/to/backups
as a remote mount point.

Now, I wonder if I change of backup strategy and make the remote hosting
the backup mounted locally on the hypervisor (via nfs, iSCSI, sshfs,
etc), should I expect write performance degradation? I mean, does the
running guest wait for underlying both mirrored disk write (cache is set
to none for the current disks)?

-- 
Nicolas Sebrecht