[libvirt] Querying block device job status and semantics of virDomainBlockJobInfo()

Kashyap Chamarthy kchamart at redhat.com
Fri Sep 2 13:56:39 UTC 2016


On Thu, Sep 01, 2016 at 02:11:34PM -0500, Eric Blake wrote:
> On 09/01/2016 08:57 AM, Kashyap Chamarthy wrote:
> > So, I'm trying to understand how libvirt reports the "cur" and "end"
> > values.  I've read the virDomainBlockJobInfo() struct, it wasn't crystal
> > clear.  It states:
> > 
> > /*
> >  * The following fields provide an indication of block job progress.  @cur
> >  * indicates the current position and will be between 0 and @end.  @end is
> >  * the final cursor position for this operation and represents completion.
> >  * To approximate progress, divide @cur by @end.
> >  */
> 
> Libvirt is (more or less) reporting numbers from qemu.  What's more,
> @end need not be the same between calls; qemu is free to change the end
> value as it comes up with more work to do (or sees that less work
> remains than initially estimated), all that REALLY matters is that the
> ratio between the two numbers converges, and is < 1 while busy, and == 1
> when complete.  Except that there are cases in qemu where a block job
> really has 0 work to do.
> 
> Prior to qemu exposing the "ready":true/false flag, libvirt had to guess
> whether equal numbers really meant done, or if it merely meant nearly
> done.  But now that we have "ready", I think the sanest course of action
> for libvirt is to fudge the numbers from qemu.  After all, we've already
> documented (both in libvirt and in qemu) that @end is not fixed, so much
> as a moving target (it just doesn't move very much on operations where
> we have a good grasp on how much work remains from the start, like a
> deep copy; but is more prone to move on operations like live commit that
> are influenced by how active the guest is at writing data that we are
> attempting to commit at the same time).

[...]

> > [kashyap]
> > 
> >     So, if the job hasn't started yet, what should libvirt report? 
> > 
> > [mprivozn]
> > 
> >     That's the question. We can't change the [virDomainBlockJobInfo]
> >     struct (otherwise we won't be ABI compatible), so we can't really
> >     add a bolean there 'bool job_started'.
> > 
> >     Or we can introduce new "job type" which wouldn't really be a job
> >     type, but we will fill status.job with it to say explicitly job
> >     hasn't started yet
> > 
> > 
> > So, any other thoughts here, on how to proceed here?
> 
> My preference would be:
> 
> If qemu doesn't report anything (because the job is not started yet),
> then libvirt should report cur=0, end=1 (the job still has 100% to go).
> 
> If qemu reports 0/0 and "done":false, then libvirt should report cur=0,
> end=1 (that is, we fudge the end to be larger, because the job is not
> done yet).
> 
> If qemu reports 0/0 and "done":true (because the job was really a
> no-op), then libvirt should report cur=1, end=1 (the job is 100% complete).
> 
> If qemu reports 0/0 and lacks "done" (older qemu), then libvirt just has
> to guess.  I'm not sure which guess is most appropriate; maybe libvirt
> itself will have to set up a timer and report 0/1 the first time, and
> only report 1/1 after a minimum time has elapsed, to make sure qemu has
> had a chance to do something about the job.  Or maybe we don't worry
> about it, and just have libvirt report 0/0 because we really don't know
> any better.
> 
> If qemu reports a/b, where a < b and b > 0, use those numbers as is.  We
> don't even have to check "done".
> 
> If qemu reports a/a, where a > 0, then also check "done". If
> "done":false is present, report a-1/a (the job is not quite done); if
> "done" is absent or "done":true is present, report a/a (the job is done).

[...]

Excellent, thanks for the detailed response, Eric.  That clarifies a
lot.

I've filed this bug, and captured the discussion from this thread.

    https://bugzilla.redhat.com/show_bug.cgi?id=1372613 -- Improve live
    block device job status reporting via virDomainBlockJobInfo() 

	* * *

Michal has pointed to these two fixes related to the above bug, in his
branch:

    https://github.com/zippy2/libvirt/commit/47dcb46 --
    qemuDomainGetBlockJobInfo: Move info translation into separate func 
    
    https://github.com/zippy2/libvirt/commit/25557dd --
    virDomainGetBlockJobInfo: Fix corner case when qemu reports no info


I don't have a proper reproducer to test the corner case.  But I just
tested the regular "deep copy" test case, with the above fixes

    $ git describe
    v2.1.0-231-g25557dd

    $ sudo ./run tools/virsh start cvm1
    
    $ sudo ./run tools/virsh dumpxml \
        --inactive cvm1 > /var/tmp/cvm1.xml
    
    $ sudo ./run tools/virsh undefine cvm1
    Domain cvm1 has been undefined
    
    $ sudo ./run tools/virsh blockcopy cvm1 \
        vda /export/copy-cirrvm1.qcow2 --wait --verbose 
    Block Copy: [100 %]
    Now in mirroring phase
    
    $ sudo ./run tools/virsh blockjob cvm1 vda --raw
     type=Block Copy
     bandwidth=0
     cur=41126400
     end=41126400
    
    $ sudo ./run tools/virsh blockjob cvm1 vda --abort

-- 
/kashyap




More information about the libvir-list mailing list