[libvirt] [PATCHv3 01/16] blockjob: add API for async virDomainBlockJobAbort

Mon Apr 9 22:53:05 UTC 2012

On 04/09/2012 02:45 PM, Laine Stump wrote:
> On 04/06/2012 02:29 PM, Eric Blake wrote:
>> From: Adam Litke <agl at us.ibm.com>
>>
>> Qemu has changed the semantics of the "block_job_cancel" API.  The original
>> qed implementation (pretty much only backported to RHEL 6.2 qemu) was
>> synchronous (ie. upon command completion, the operation was guaranteed to
>> be completely stopped).  With the new semantics going into qemu 1.1 for
>> qcow2, a "block_job_cancel" merely requests that the operation be cancelled
>> and an event is triggered once the cancellation request has been honored.
>>
>>
>> @@ -3617,6 +3626,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn,
>>  typedef enum {
>>      VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0,
>>      VIR_DOMAIN_BLOCK_JOB_FAILED = 1,
>> +    VIR_DOMAIN_BLOCK_JOB_CANCELED = 2,
> 
> 
> You know, this is one case where I think the Queen's English has it
> right - something just seems *wrong* about spelling CANCELED with one "L"...
> 

I think the consensus was to add aliases so that either spelling would
work, but that will have to be another patch for another day.

>>   *
>> + * By default, this function performs a synchronous operation and the caller
>> + * may assume that the operation has completed when 0 is returned.  However,
>> + * BlockJob operations may take a long time to complete, and during this time
>> + * further domain interactions may be unresponsive.  To avoid this problem,
>> + * pass VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the @flags argument to enable
>> + * asynchronous behavior.  Either way, when the job has been cancelled, a
>> + * BlockJob event will be emitted, with status VIR_DOMAIN_BLOCK_JOB_CANCELLED.

But it would help if I can spell it consistently within a single patch :)

Also, I'm going to update this to mention that the ASYNC flag is not
recognized if qemu is too old, and that the event is not guaranteed
(along with the polling loop that the user can rely on if they don't
want to trust the event).

>> @@ -785,11 +789,19 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da
>>      if (STREQ(type_str, "stream"))
>>          type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
>>
>> -    if (offset != 0 && offset == len)
>> -        status = VIR_DOMAIN_BLOCK_JOB_COMPLETED;
>> +    switch (event) {
>> +        case VIR_DOMAIN_BLOCK_JOB_COMPLETED:
>> +            /* Make sure the whole device has been processed */
>> +            if (offset != len)
>> +                event = VIR_DOMAIN_BLOCK_JOB_FAILED;
>> +            break;
> 
> 
> Okay, so if you're talking to an "old" qemu, it will return a COMPLETED
> event with offset < length to indicate failure, which is handled here...

Actually, the old qemu would never return an event on cancellation
(except in the inherent race where you request a cancel, but the job
completes normally before qemu can act on your request).  Rather, both
the old qemu and new qemu return the 'BLOCK_JOB_COMPLETE' event on
completion, and you must then look at offset/len to decide if it was a
successful completion or a failure.  It is only the new qemu that adds
the new 'BLOCK_JOB_CANCELLED' event (this time, we have the qemu
spelling, and it is UK spelling).

> 
> 
>> +        case VIR_DOMAIN_BLOCK_JOB_FAILED:
> 
> 
> and if it's a new qemu, it will return FAILED explicitly, which is
> handled here. What happens when it's an old libvirt and new qemu? (not
> that there's anything libvirt could do about it, I'm just curious)

Actually, new qemu only emits two types of events (COMPLETE and
CANCELLED), and this helper function converts those two types into three
user-visible types (COMPLETED, FAILED, CANCELED).  I think what I should
do is mark the unreachable case statement accordingly.

> 
> 
>> +        case VIR_DOMAIN_BLOCK_JOB_CANCELED:
>> +            break;
>> +    }
>>
>>  out:
>> -    qemuMonitorEmitBlockJob(mon, device, type, status);
>> +    qemuMonitorEmitBlockJob(mon, device, type, event);
>>  }
>>
>>  static void
>> @@ -832,6 +844,22 @@ qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon,
>>      qemuMonitorEmitPMSuspend(mon);
>>  }
>>
>> +static void
>> +qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon,
>> +                                       virJSONValuePtr data)
>> +{
>> +    qemuMonitorJSONHandleBlockJobImpl(mon, data,
>> +                                      VIR_DOMAIN_BLOCK_JOB_COMPLETED);
>> +}
>> +
>> +static void
>> +qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon,
>> +                                       virJSONValuePtr data)
>> +{
>> +    qemuMonitorJSONHandleBlockJobImpl(mon, data,
>> +                                      VIR_DOMAIN_BLOCK_JOB_CANCELED);
>> +}

Here's our two event handlers that both call into the helper function
that translates the 2 qemu events into 3 libvirt events.

>> @@ -7613,19 +7621,23 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd)
>>      virDomainBlockJobInfo info;
>>      const char *type;
>>      int ret;
>> +    bool abortMode = vshCommandOptBool(cmd, "abort");
>> +    bool infoMode = vshCommandOptBool(cmd, "info");
>> +    bool bandwidth = vshCommandOptBool(cmd, "bandwidth");
>>
>> -    if (vshCommandOptBool (cmd, "abort")) {
>> -        mode = VSH_CMD_BLOCK_JOB_ABORT;
>> -    } else if (vshCommandOptBool (cmd, "info")) {
>> -        mode = VSH_CMD_BLOCK_JOB_INFO;
>> -    } else if (vshCommandOptBool (cmd, "bandwidth")) {
>> -        mode = VSH_CMD_BLOCK_JOB_SPEED;
>> -    } else {
>> +    if (abortMode + infoMode + bandwidth > 1) {
>>          vshError(ctl, "%s",
>>                   _("One of --abort, --info, or --bandwidth is required"));
> 
> 
> You're changing the test here to require 0 or 1 of the options, rather
> than exactly one, effectively making --info the default. That makes
> sense, but I think the error message should be a bit different, maybe
> "no more than one of --abort, --info, or --bandwidth is allowed (default
> is --info)".

In fact, in a later patch, I changed it even more, to make --async (or
--pivot) in isolation imply --abort.  The important part to remember
here is that this error message only occurs if more than one mode is
requested at once.

>> -=item B<blockjob> I<domain> I<path> [I<--abort>] [I<--info>] [I<bandwidth>]
>> +=item B<blockjob> I<domain> I<path> { I<--abort> [I<--async>] |
>> +[I<--info>] | I<bandwidth> }
> 
> 
> A bit awkward to parse, but it does eventually make sense :-)

I think I can rebase part of that refactoring earlier, so in v4, I think
I will just list things as:

    bool abortMode = (vshCommandOptBool(cmd, "abort") ||
                      vshCommandOptBool(cmd, "async"));

if (abortMode + infoMode + bandwidth > 1) {
    vshError(ctl, "%s",
             _("conflict between --abort, --info, and --bandwidth modes"));

B<blockjob> I<domain> I<path> { [I<--abort>] [I<--async] | [I<--info>] |
[I<bandwidth>] }

Manage active block operations.  There are three modes: I<--info>,
I<bandwidth>, and I<--abort>; I<--info> is default except that
I<--async> implies I<--abort>.

> 
> ACK with the change to the error message.

I've got enough tweaks that it will be worth looking at again in v4.

-- 
Eric Blake   eblake at redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 620 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20120409/8719c9c1/attachment-0001.sig>