[libvirt-users] virsh blockcommit fails regularily (was: virtual drive performance)

Dominik Psenner dpsenner at gmail.com
Mon Aug 14 15:46:46 UTC 2017


Thanks Peter for your feedback. Interestingly the version of virsh is newer
than 1.2.18 and thus should contain the fix:

$ virsh --version
1.3.1

$ uname -a
Linux agsserver 4.4.0-91-generic #114-Ubuntu SMP Tue Aug 8 11:56:56 UTC
2017 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

But we're still having the issue. Is there anything else that you can think
about? Feel free to query me for more information. I'm willing to help
wherever I can because this bugs us quite regularly. We could probably
improve our daily backup cronjob to retry blockcommit after a blockjob
abort, but it feels so hacky that I would do that only as the last resort.

2017-08-14 17:05 GMT+02:00 Peter Krempa <pkrempa at redhat.com>:

> On Mon, Aug 14, 2017 at 08:42:24 +0200, Dominik Psenner wrote:
> > Hi,
>
> Hi,
>
> >
> > a small update on this. We have migrated the virtualized host to use the
> > virtio drivers and now the drive performance is improved so that we can
> see
> > a constant transfer rate. Before it used to be the same rate but
> regularly
> > dropped to a few bytes/sec for a few seconds and then was fast again.
> >
> > However we still observe that the following fails regularily:
> >
> > $ virsh snapshot-create-as --domain domain --name backup --no-metadata
> > --atomic --disk-only --diskspec hda,snapshot=external
> > $ virsh blockcommit domain hda --active --pivot
> > error: failed to pivot job for disk hda
> > error: block copy still active: disk 'hda' not ready for pivot yet
> > Could not merge changes for disk hda of domain. VM may be in invalid
> state.
>
> since this thread was renamed, please re-state the version of libvirt
> you are using. I don't really want to dig through the old thread.
>
> > Then running the following in the morning succeeds and successfully
> pivotes
> > the snapshot into the base image while the vm is live:
> >
> > $ virsh blockjob domain hda --abort
> > $ virsh blockcommit domain hda --active --pivot
> > Successfully pivoted
> >
> > We run the backup process every day once and it failed on the following
> > days:
> >
> > 2017-07-07
> > 2017-07-20
> > 2017-07-27
> > 2017-08-12
> > 2017-08-14
> >
> > Looking at this it roughly happens once a week and the guest from then on
> > writes into the snapshot backlog. That snapshot backlog file grows about
> > 8gb every day and thus the issue always needs immediate attention.
> >
> > Any ideas what could cause this issue? Is this a bug (race condition) of
> > `virsh blockcommit` that sometimes fails because it is invoked at the
> wrong
> > time?
>
> So the 'virsh blockcommit domain hda --active --pivot' operation
> consists of 3 parts:
>
> 1) virsh blockcommit domain hda --active
> 2) waiting until the block job finishes
> 3) virsh blockjob --pivot domain hda
>
> The problem is that some times 2) finishes too soon and then operation 3
> fails. This should not happen any more, since there's code in virsh [1]
> which waits for the completion event from libvirtd, which is fired only
> when the job is actually ready to be pivoted.
>
> This code has a lot of fallback options in case when libvirtd is old or
> so.
>
> At any rate, manual pivoting later should help. Also probably updating
> to a more recent version.
>
> In case you are using a farily recent version, it's possible that there
> are still bugs though.
>
> Peter
>
> [1]:
>
> commit 7408403560f7d054da75acaab855a95c51a92e2b
> Author: Peter Krempa <pkrempa at redhat.com>
> Date:   Mon Jul 13 17:04:49 2015 +0200
>
>     virsh: Refactor block job waiting in cmdBlockCommit
>
>     Reuse the vshBlockJobWait infrastructure to refactor cmdBlockCommit to
>     use the common code. This additionally fixes a bug when working with
>     new qemus, where when doing an active commit with --pivot the pivoting
>     would fail, since qemu reaches 100% completion but the job doesn't
>     switch to synchronized phase right away.
>
> $ git describe --contains 7408403560f7d054da75acaab855a95c51a92e2b
> v1.2.18-rc1~33
>
>


-- 
Dominik Psenner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20170814/cc7553f5/attachment.htm>


More information about the libvirt-users mailing list