[libvirt-users] virsh blockcommit fails regularily (was: virtual drive performance)

Peter Krempa pkrempa at redhat.com
Mon Aug 14 15:05:23 UTC 2017


On Mon, Aug 14, 2017 at 08:42:24 +0200, Dominik Psenner wrote:
> Hi,

Hi,

> 
> a small update on this. We have migrated the virtualized host to use the
> virtio drivers and now the drive performance is improved so that we can see
> a constant transfer rate. Before it used to be the same rate but regularly
> dropped to a few bytes/sec for a few seconds and then was fast again.
> 
> However we still observe that the following fails regularily:
> 
> $ virsh snapshot-create-as --domain domain --name backup --no-metadata
> --atomic --disk-only --diskspec hda,snapshot=external
> $ virsh blockcommit domain hda --active --pivot
> error: failed to pivot job for disk hda
> error: block copy still active: disk 'hda' not ready for pivot yet
> Could not merge changes for disk hda of domain. VM may be in invalid state.

since this thread was renamed, please re-state the version of libvirt
you are using. I don't really want to dig through the old thread.

> Then running the following in the morning succeeds and successfully pivotes
> the snapshot into the base image while the vm is live:
> 
> $ virsh blockjob domain hda --abort
> $ virsh blockcommit domain hda --active --pivot
> Successfully pivoted
> 
> We run the backup process every day once and it failed on the following
> days:
> 
> 2017-07-07
> 2017-07-20
> 2017-07-27
> 2017-08-12
> 2017-08-14
> 
> Looking at this it roughly happens once a week and the guest from then on
> writes into the snapshot backlog. That snapshot backlog file grows about
> 8gb every day and thus the issue always needs immediate attention.
> 
> Any ideas what could cause this issue? Is this a bug (race condition) of
> `virsh blockcommit` that sometimes fails because it is invoked at the wrong
> time?

So the 'virsh blockcommit domain hda --active --pivot' operation
consists of 3 parts:

1) virsh blockcommit domain hda --active
2) waiting until the block job finishes
3) virsh blockjob --pivot domain hda

The problem is that some times 2) finishes too soon and then operation 3
fails. This should not happen any more, since there's code in virsh [1]
which waits for the completion event from libvirtd, which is fired only
when the job is actually ready to be pivoted.

This code has a lot of fallback options in case when libvirtd is old or
so.

At any rate, manual pivoting later should help. Also probably updating
to a more recent version.

In case you are using a farily recent version, it's possible that there
are still bugs though.

Peter

[1]:

commit 7408403560f7d054da75acaab855a95c51a92e2b
Author: Peter Krempa <pkrempa at redhat.com>
Date:   Mon Jul 13 17:04:49 2015 +0200

    virsh: Refactor block job waiting in cmdBlockCommit
    
    Reuse the vshBlockJobWait infrastructure to refactor cmdBlockCommit to
    use the common code. This additionally fixes a bug when working with
    new qemus, where when doing an active commit with --pivot the pivoting
    would fail, since qemu reaches 100% completion but the job doesn't
    switch to synchronized phase right away.

$ git describe --contains 7408403560f7d054da75acaab855a95c51a92e2b
v1.2.18-rc1~33

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20170814/2da4722f/attachment.sig>


More information about the libvirt-users mailing list