[libvirt] RFC: Exposing "ready" bool (of `query-block-jobs`) or QMP BLOCK_JOB_READY event

Kashyap Chamarthy kchamart at redhat.com
Thu Oct 6 12:31:39 UTC 2016


On Thu, Oct 06, 2016 at 01:34:59PM +0200, Peter Krempa wrote:
> On Thu, Oct 06, 2016 at 12:26:50 +0200, Kashyap Chamarthy wrote:

[...]

> > And, libvirt was fixed to use the above field in this commit (available
> > in libvirt v1.2.18 or above):
> > 
> >     http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=eae5924 -- qemu:
> >     Update state of block job to READY only if it actually is ready [1]
> > 
> > RFC
> > ---
> > 
> > Currently libvirt block APIs (& consequently higher-level applications
> > like Nova which use these APIs) rely on polling for job completion via
> 
> libvirt is _not_ polling the data. Libvirt relies on the events from
> qemu which are also exposed as libvirt events.

Err, you're right.  I know you mean the below enum:

    enum virConnectDomainEventBlockJobStatus {
    
    VIR_DOMAIN_BLOCK_JOB_COMPLETED  =   0
    VIR_DOMAIN_BLOCK_JOB_FAILED     =   1
    VIR_DOMAIN_BLOCK_JOB_CANCELED   =   2
    VIR_DOMAIN_BLOCK_JOB_READY      =   3
    VIR_DOMAIN_BLOCK_JOB_LAST       =   4
    }

Which are used internally in the event processing code in
qemuBlockJobEventProcess() [libvirt/src/qemu/qemu_blockjob.c] that is
used to update the disk mirroring state based on events from QEMU.
 
> > virDomainGetBlockJobInfo(), which uses QMP `query-block-jobs`, and
> > waits for QEMU to report "offset" == "len", which translates to
> > libvirt "cur" == "end".  Based on this, libvirt can take an action
> > (whether to gracefully abort, or pivot to the copy in case of a COPY
> > job).
> 
> Again false. Internally we honor the ready state

Oops, I do realize that libvirt honors seen the 'ready' boolean -- I
even mentioned the libvirt commit message (eae5924) from you in the
beginning, which handles it.

> and applications like
> virsh have code in place that waits for the async events and act after
> receiving them.

Yep, I remember Eric mentioning the other day that `virsh` is setup to
capture libvirt events.  So I briefly went through the
virshBlockJobWait() in tools/virsh-domain.c where you see that:

		[...]
        /* Fallback behaviour is only needed if one or both callbacks could not
         * be registered */
        if (data->cb_id < 0 || data->cb_id2 < 0) {
            /* If the block job vanishes, synthesize a COMPLETED event */
            if (result == 0) {
                ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED;
                break;
            }

            /* If the block job hits 100%, wait a little while for a possible
             * event from libvirt unless both callbacks could not be registered
             * in order to synthesize our own READY event */
            if (info.end == info.cur &&
                ((data->cb_id < 0 && data->cb_id2 < 0) || --retries == 0)) {
                ret = VIR_DOMAIN_BLOCK_JOB_READY;
                break;
            }
        }
		[...]

> > Since QEMU reports the "ready": true field (followed by a
> > BLOCK_JOB_READY QMP event).  It would be helpful if libvirt expose this
> > via an API, so upper layers could instead use that, rather than polling.
> 
> We expose the state of the copy job in the XML and forward the READY
> event from qemu to the users.

Yep, I see that in the QMP traffic:

-----
2016-10-04 15:54:52.333+0000: 30550: info : qemuMonitorIOProcess:423 :
QEMU_MONITOR_IO_PROCESS: mon=0x7fa43000ee60 buf={"timestamp":
{"seconds": 1475596492, "microseconds": 333414}, "event":
 "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk1", "len":
  1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}}
-----
 
> A running copy job exposes itself in the xml as:
> 
>     <disk type='file' device='cdrom'>
>       <driver name='qemu' type='raw'/>
>       <source file='/var/lib/libvirt/images/systemrescuecd-x86-4.8.0.iso'/>
>       <backingStore/>
>       <mirror type='file' file='/tmp/ble.img' format='raw' job='copy'>
>         <format type='raw'/>
>         <source file='/tmp/ble.img'/>
>       </mirror>
>       [...]
>     </disk>
> 
> While the ready copy job is exposed as:
> 
>     <disk type='file' device='cdrom'>
>       <driver name='qemu' type='raw'/>
>       <source file='/var/lib/libvirt/images/systemrescuecd-x86-4.8.0.iso'/>
>       <backingStore/>
>       <mirror type='file' file='/tmp/ble.img' format='raw' job='copy' ready='yes'>
>         <format type='raw'/>
>         <source file='/tmp/ble.img'/>
>       </mirror>
>       [...]
>     </disk>

Thanks for the XML snippets.

> Additionally we have anyncrhronous events that are emitted once qemu
> notifies us that the block job has reached sync state or finished.
> Libvirt uses the event to switch to the ready state.
> 
> The documentation suggests that block jobs should listen to the events
> and act accordingly only after receiving the event.
> 
> 
> > Problem scenario
> > ----------------
> > 
> > When virDomainBlockRebase() is invoked to start a copy job, then
> > aborting the said copy operation with virDomainBlockJobAbort() + flag
> > VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT can result in a potential race
> > condition (due to the way the virDomainGetBlockJobInfo() reports the job
> > status) where the pivot operation fails.
> 
> Again, this should trigger the block job event and that should be used
> as the marker to perform the pivot operation.
> 
> $ virsh event --all --loop
> event 'block-job-2' for domain hp: Block Copy for hda ready  <- after the sync phase is reached
> event 'block-job-2' for domain hp: Block Copy for hda completed <- after successfully pivoting
> 
> Applications should act only after receiving such event.

Okay, Eric also has suggested

> 
> > Race condition window
> > ~~~~~~~~~~~~~~~~~~~~~
> > 
> > libvirt finds cur==end AND sends a pivot request, all in the window
> > before QEMU would have sent "ready": true field [emitted as part of the
> 
> This is not true. Libvirt checks that the mirror is actually ready. It's
> done by the commit you've mentioned above.

Right, because I've independently tested this with libvirt and have seen
the correct behavior.  I meant to write: s/libvirt finds/Nova libvirt
driver finds/.

 
> > QMP `query-block-jobs` command's response, indicating that the job has
> > actually completed], however the pivot request fails because it requires
> > "ready": true.
> 
> The problem is that you are polling the block job info which correctly
> reports that all data was properly copied and you are inferring the
> block job state from that data.

Indeed.  We should listen to block job events.
 
> I'm against deliberately reporting false data in the block info
> structure.

Yes, you're not alone in that, Nova developer (Matt Booth, CCed) also
expressed a similar concern, albeit not as strongly as you.

> The application should register handlers for the block job events and
> act only if it receives such event. Additionally you may want to check
> that the state is correct in the XML. The current block job information
> structure can't be extended unfortunately.

Okay, we sort of came to the similar conclusion (in a discussion on the
upstream Nova IRC channel) to rely on events, rather than polling for
cur/end values.
 
> Please look at the virsh handling of the block jobs for inspiration
> since we have example code doing the right thing here.

Yes, looking already at them.  Thanks!


-- 
/kashyap




More information about the libvir-list mailing list