[libvirt] Querying block device job status and semantics of virDomainBlockJobInfo()
Kashyap Chamarthy
kchamart at redhat.com
Fri Sep 2 13:56:39 UTC 2016
On Thu, Sep 01, 2016 at 02:11:34PM -0500, Eric Blake wrote:
> On 09/01/2016 08:57 AM, Kashyap Chamarthy wrote:
> > So, I'm trying to understand how libvirt reports the "cur" and "end"
> > values. I've read the virDomainBlockJobInfo() struct, it wasn't crystal
> > clear. It states:
> >
> > /*
> > * The following fields provide an indication of block job progress. @cur
> > * indicates the current position and will be between 0 and @end. @end is
> > * the final cursor position for this operation and represents completion.
> > * To approximate progress, divide @cur by @end.
> > */
>
> Libvirt is (more or less) reporting numbers from qemu. What's more,
> @end need not be the same between calls; qemu is free to change the end
> value as it comes up with more work to do (or sees that less work
> remains than initially estimated), all that REALLY matters is that the
> ratio between the two numbers converges, and is < 1 while busy, and == 1
> when complete. Except that there are cases in qemu where a block job
> really has 0 work to do.
>
> Prior to qemu exposing the "ready":true/false flag, libvirt had to guess
> whether equal numbers really meant done, or if it merely meant nearly
> done. But now that we have "ready", I think the sanest course of action
> for libvirt is to fudge the numbers from qemu. After all, we've already
> documented (both in libvirt and in qemu) that @end is not fixed, so much
> as a moving target (it just doesn't move very much on operations where
> we have a good grasp on how much work remains from the start, like a
> deep copy; but is more prone to move on operations like live commit that
> are influenced by how active the guest is at writing data that we are
> attempting to commit at the same time).
[...]
> > [kashyap]
> >
> > So, if the job hasn't started yet, what should libvirt report?
> >
> > [mprivozn]
> >
> > That's the question. We can't change the [virDomainBlockJobInfo]
> > struct (otherwise we won't be ABI compatible), so we can't really
> > add a bolean there 'bool job_started'.
> >
> > Or we can introduce new "job type" which wouldn't really be a job
> > type, but we will fill status.job with it to say explicitly job
> > hasn't started yet
> >
> >
> > So, any other thoughts here, on how to proceed here?
>
> My preference would be:
>
> If qemu doesn't report anything (because the job is not started yet),
> then libvirt should report cur=0, end=1 (the job still has 100% to go).
>
> If qemu reports 0/0 and "done":false, then libvirt should report cur=0,
> end=1 (that is, we fudge the end to be larger, because the job is not
> done yet).
>
> If qemu reports 0/0 and "done":true (because the job was really a
> no-op), then libvirt should report cur=1, end=1 (the job is 100% complete).
>
> If qemu reports 0/0 and lacks "done" (older qemu), then libvirt just has
> to guess. I'm not sure which guess is most appropriate; maybe libvirt
> itself will have to set up a timer and report 0/1 the first time, and
> only report 1/1 after a minimum time has elapsed, to make sure qemu has
> had a chance to do something about the job. Or maybe we don't worry
> about it, and just have libvirt report 0/0 because we really don't know
> any better.
>
> If qemu reports a/b, where a < b and b > 0, use those numbers as is. We
> don't even have to check "done".
>
> If qemu reports a/a, where a > 0, then also check "done". If
> "done":false is present, report a-1/a (the job is not quite done); if
> "done" is absent or "done":true is present, report a/a (the job is done).
[...]
Excellent, thanks for the detailed response, Eric. That clarifies a
lot.
I've filed this bug, and captured the discussion from this thread.
https://bugzilla.redhat.com/show_bug.cgi?id=1372613 -- Improve live
block device job status reporting via virDomainBlockJobInfo()
* * *
Michal has pointed to these two fixes related to the above bug, in his
branch:
https://github.com/zippy2/libvirt/commit/47dcb46 --
qemuDomainGetBlockJobInfo: Move info translation into separate func
https://github.com/zippy2/libvirt/commit/25557dd --
virDomainGetBlockJobInfo: Fix corner case when qemu reports no info
I don't have a proper reproducer to test the corner case. But I just
tested the regular "deep copy" test case, with the above fixes
$ git describe
v2.1.0-231-g25557dd
$ sudo ./run tools/virsh start cvm1
$ sudo ./run tools/virsh dumpxml \
--inactive cvm1 > /var/tmp/cvm1.xml
$ sudo ./run tools/virsh undefine cvm1
Domain cvm1 has been undefined
$ sudo ./run tools/virsh blockcopy cvm1 \
vda /export/copy-cirrvm1.qcow2 --wait --verbose
Block Copy: [100 %]
Now in mirroring phase
$ sudo ./run tools/virsh blockjob cvm1 vda --raw
type=Block Copy
bandwidth=0
cur=41126400
end=41126400
$ sudo ./run tools/virsh blockjob cvm1 vda --abort
--
/kashyap
More information about the libvir-list
mailing list