[libvirt] RFC: Exposing "ready" bool (of `query-block-jobs`) or QMP BLOCK_JOB_READY event
Kashyap Chamarthy
kchamart at redhat.com
Thu Oct 6 12:31:39 UTC 2016
On Thu, Oct 06, 2016 at 01:34:59PM +0200, Peter Krempa wrote:
> On Thu, Oct 06, 2016 at 12:26:50 +0200, Kashyap Chamarthy wrote:
[...]
> > And, libvirt was fixed to use the above field in this commit (available
> > in libvirt v1.2.18 or above):
> >
> > http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=eae5924 -- qemu:
> > Update state of block job to READY only if it actually is ready [1]
> >
> > RFC
> > ---
> >
> > Currently libvirt block APIs (& consequently higher-level applications
> > like Nova which use these APIs) rely on polling for job completion via
>
> libvirt is _not_ polling the data. Libvirt relies on the events from
> qemu which are also exposed as libvirt events.
Err, you're right. I know you mean the below enum:
enum virConnectDomainEventBlockJobStatus {
VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0
VIR_DOMAIN_BLOCK_JOB_FAILED = 1
VIR_DOMAIN_BLOCK_JOB_CANCELED = 2
VIR_DOMAIN_BLOCK_JOB_READY = 3
VIR_DOMAIN_BLOCK_JOB_LAST = 4
}
Which are used internally in the event processing code in
qemuBlockJobEventProcess() [libvirt/src/qemu/qemu_blockjob.c] that is
used to update the disk mirroring state based on events from QEMU.
> > virDomainGetBlockJobInfo(), which uses QMP `query-block-jobs`, and
> > waits for QEMU to report "offset" == "len", which translates to
> > libvirt "cur" == "end". Based on this, libvirt can take an action
> > (whether to gracefully abort, or pivot to the copy in case of a COPY
> > job).
>
> Again false. Internally we honor the ready state
Oops, I do realize that libvirt honors seen the 'ready' boolean -- I
even mentioned the libvirt commit message (eae5924) from you in the
beginning, which handles it.
> and applications like
> virsh have code in place that waits for the async events and act after
> receiving them.
Yep, I remember Eric mentioning the other day that `virsh` is setup to
capture libvirt events. So I briefly went through the
virshBlockJobWait() in tools/virsh-domain.c where you see that:
[...]
/* Fallback behaviour is only needed if one or both callbacks could not
* be registered */
if (data->cb_id < 0 || data->cb_id2 < 0) {
/* If the block job vanishes, synthesize a COMPLETED event */
if (result == 0) {
ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED;
break;
}
/* If the block job hits 100%, wait a little while for a possible
* event from libvirt unless both callbacks could not be registered
* in order to synthesize our own READY event */
if (info.end == info.cur &&
((data->cb_id < 0 && data->cb_id2 < 0) || --retries == 0)) {
ret = VIR_DOMAIN_BLOCK_JOB_READY;
break;
}
}
[...]
> > Since QEMU reports the "ready": true field (followed by a
> > BLOCK_JOB_READY QMP event). It would be helpful if libvirt expose this
> > via an API, so upper layers could instead use that, rather than polling.
>
> We expose the state of the copy job in the XML and forward the READY
> event from qemu to the users.
Yep, I see that in the QMP traffic:
-----
2016-10-04 15:54:52.333+0000: 30550: info : qemuMonitorIOProcess:423 :
QEMU_MONITOR_IO_PROCESS: mon=0x7fa43000ee60 buf={"timestamp":
{"seconds": 1475596492, "microseconds": 333414}, "event":
"BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk1", "len":
1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}}
-----
> A running copy job exposes itself in the xml as:
>
> <disk type='file' device='cdrom'>
> <driver name='qemu' type='raw'/>
> <source file='/var/lib/libvirt/images/systemrescuecd-x86-4.8.0.iso'/>
> <backingStore/>
> <mirror type='file' file='/tmp/ble.img' format='raw' job='copy'>
> <format type='raw'/>
> <source file='/tmp/ble.img'/>
> </mirror>
> [...]
> </disk>
>
> While the ready copy job is exposed as:
>
> <disk type='file' device='cdrom'>
> <driver name='qemu' type='raw'/>
> <source file='/var/lib/libvirt/images/systemrescuecd-x86-4.8.0.iso'/>
> <backingStore/>
> <mirror type='file' file='/tmp/ble.img' format='raw' job='copy' ready='yes'>
> <format type='raw'/>
> <source file='/tmp/ble.img'/>
> </mirror>
> [...]
> </disk>
Thanks for the XML snippets.
> Additionally we have anyncrhronous events that are emitted once qemu
> notifies us that the block job has reached sync state or finished.
> Libvirt uses the event to switch to the ready state.
>
> The documentation suggests that block jobs should listen to the events
> and act accordingly only after receiving the event.
>
>
> > Problem scenario
> > ----------------
> >
> > When virDomainBlockRebase() is invoked to start a copy job, then
> > aborting the said copy operation with virDomainBlockJobAbort() + flag
> > VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT can result in a potential race
> > condition (due to the way the virDomainGetBlockJobInfo() reports the job
> > status) where the pivot operation fails.
>
> Again, this should trigger the block job event and that should be used
> as the marker to perform the pivot operation.
>
> $ virsh event --all --loop
> event 'block-job-2' for domain hp: Block Copy for hda ready <- after the sync phase is reached
> event 'block-job-2' for domain hp: Block Copy for hda completed <- after successfully pivoting
>
> Applications should act only after receiving such event.
Okay, Eric also has suggested
>
> > Race condition window
> > ~~~~~~~~~~~~~~~~~~~~~
> >
> > libvirt finds cur==end AND sends a pivot request, all in the window
> > before QEMU would have sent "ready": true field [emitted as part of the
>
> This is not true. Libvirt checks that the mirror is actually ready. It's
> done by the commit you've mentioned above.
Right, because I've independently tested this with libvirt and have seen
the correct behavior. I meant to write: s/libvirt finds/Nova libvirt
driver finds/.
> > QMP `query-block-jobs` command's response, indicating that the job has
> > actually completed], however the pivot request fails because it requires
> > "ready": true.
>
> The problem is that you are polling the block job info which correctly
> reports that all data was properly copied and you are inferring the
> block job state from that data.
Indeed. We should listen to block job events.
> I'm against deliberately reporting false data in the block info
> structure.
Yes, you're not alone in that, Nova developer (Matt Booth, CCed) also
expressed a similar concern, albeit not as strongly as you.
> The application should register handlers for the block job events and
> act only if it receives such event. Additionally you may want to check
> that the state is correct in the XML. The current block job information
> structure can't be extended unfortunately.
Okay, we sort of came to the similar conclusion (in a discussion on the
upstream Nova IRC channel) to rely on events, rather than polling for
cur/end values.
> Please look at the virsh handling of the block jobs for inspiration
> since we have example code doing the right thing here.
Yes, looking already at them. Thanks!
--
/kashyap
More information about the libvir-list
mailing list