[libvirt] [PATCH] qemu_hotplug: try harder to eject media

Thu Jun 25 18:21:30 UTC 2015

On 06/25/2015 01:52 PM, Pavel Hrdina wrote:
> On Thu, Jun 25, 2015 at 12:06:49PM -0400, John Ferlan wrote:
>>
>>
>> On 06/23/2015 07:13 AM, Pavel Hrdina wrote:
>>> Some guests lock the tray and QEMU eject command will simply fail to
>>> eject the media.  But the guest OS can handle this attempt to eject the
>>> media and can unlock the tray and open it. In this case, we should try
>>> again to actually eject the media.
>>>
>>> If the first attempt fails we will wait for tray to open and try again
>>> to eject the media, but if the second attempt fails, returns with error.
>>>
>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1147471
>>>
>>> Signed-off-by: Pavel Hrdina <phrdina at redhat.com>
>>> ---
>>>  src/qemu/qemu_hotplug.c | 33 ++++++++++++++++++++++++++++-----
>>>  1 file changed, 28 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
>>> index 0628964..0cebaad 100644
>>> --- a/src/qemu/qemu_hotplug.c
>>> +++ b/src/qemu/qemu_hotplug.c
>>> @@ -201,13 +201,17 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
>>>      }
>>>  
>>>      if (ret < 0)
>>> -        goto error;
>>> +        VIR_DEBUG("tray is probably locked, wait for the guest to unlock "
>>> +                  "the tray and open it");
>>
>> There are more reasons than waiting for the unlock that would cause a
>> negative value to be returned, so this seems "dangerous"..
> 
> Yes, I know that.  That's why the code will wait for the event DEVICE_TRAY_MOVED
> and only if that event where emitted from QEMU, then we will tray again.
> Otherwise the original error from the first attempt will be used.
> 

I didn't see DEVICE_TRAY_MOVED in this patch...  The wait loop I see
below was VIR_DOMAIN_DISK_TRAY_OPEN.

>>
>> !mon
>> if json
>>    !cmd
>>    qemuMonitorJSONCommand failure
>>    qemuMonitorJSONCheckError failure
>> else
>>    virAsprintf Failure
>>    qemuMonitorHMPCommand failure
>>    c_strcasestr failure
>>
>>
>> Is there no way to determine that the failure was due to a locked tray
>> from the error message?  The bz notes a DEVICE_TRAY_MOVED event, but I
>> don't see that here.
> 
> Yes, there is a way, that we can parse the error message and wait only, if it
> contains the phrase "is locked".  For that QEMU event we have a handler
> registered and this handler updates the disk->tray_status.
> 
> IOW, checking the disk->tray_status is the same as waiting for DEVICE_TRAY_MOVED
> event.
> 

Right and I didn't see that so it wasn't clear to me how this patch
fixed the problem....

>>
>> I would think that if this is "handle-able" from qemu/json only, then
>> the qemuMonitorJSONEjectMedia would be the best place to at least check
>> for this path of retry logic.
>>
>> You'll have the json error returned (per the link in bz comment 3):
>>
>> {"error": {"class": "GenericError", "desc": "Device 'cd' is locked"}}
>>
>> That should be parse-able by this code and either handled back in the
>> caller or in this code.
> 
> Also HMP returns the same error, so it's definitely parse-able for both
> monitors.  At first I didn't want to go through parsing the error message, but
> it makes sense to retry only in case, that the error message contains "is
> locked" message.
> 
>>
>> Since qemuMonitorJSONEjectMedia doesn't list return values, perhaps use
>> "-1" to mean total failure, 0 to mean pseudo-success as it does today,
>> and 1 to indicate tray was locked and that the caller can/should retry.
>>
> 
> 
> 
>> FWIW: By pseudo-success... I see that if 0 is currently returned, the
>> while loop will look for the "tray_status == VIR_DOMAIN_DISK_TRAY_OPEN"
>> and fail if not eventually seen
>>
>> Consider the following...
>>
>>     bool retry_lock_tray = false;
>>
>>     do {
>>         qemuDomainObjEnterMonitor(driver, vm);
>>         ret = qemuMonitorEjectMedia(priv->mon, driveAlias, force);
>>         if (qemuDomainObjExitMonitor(driver, vm) < 0) {
>>             ret = -1;
>>             goto cleanup;
>>         }
>>         /* We already tried, but got the locked error again */
>>         if (ret == 1 && retry_lock_tray) {
>>             some sort of failure since you retried
>>             ret = -1;
>>             goto error;
>>         }
>>         /* We were locked, but haven't retried */
>>         if (ret == 1) {
>>             Wait some period of time for the DEVICE_TRAY_MOVED event
>>             if event found
>>                 retry_lock_tray = true;
>>         }
>>     } while (retry_lock_tray);
>>
>>     if (ret < 0)
>>         goto error;
>>
>>     if (ret == 1) {
>>         ERROR tray locked by guest, never got DEVICE_TRAY_MOVED
>>         ret = -1;
>>         got error;
>>     }
>>
> 
> I'll post a v2.
> 
>>
>>
>>>  
>>>      virObjectRef(vm);
>>>      /* we don't want to report errors from media tray_open polling */
>>>      while (retries) {
>>> -        if (disk->tray_status == VIR_DOMAIN_DISK_TRAY_OPEN)
>>> +        if (disk->tray_status == VIR_DOMAIN_DISK_TRAY_OPEN) {
>>> +            ret = 0;
>>> +            virResetLastError();
>>>              break;
>>> +        }
>>>  
>>>          retries--;
>>>          virObjectUnlock(vm);
>>
>> If it's locked how would this loop ever succeed?  That is how would the
>> tray open "magically" if it's locked and we haven't sent another eject
>> request?
> 
> I don't understand this question??  The vm object is locked before we enter this
> function and this loop only unlock the vm object for the time it is waiting for
> the event to occur.
> 

So given the patch that I read, I saw nothing that polled for
DEVICE_TRAY_MOVED, thus how would tray_status become TRAY_OPEN? I think
the answer is - it won't under this scenario, hence the code beyond it
which does the eject again under the assumption that the reason we had a
-1 return was because it was locked.  That second eject doesn't go
through this wait for TRAY_OPEN event either, so hence my suggestion for
a do...while loop.

Or are you indicating that TRAY_OPEN setting occurs for a TRAY_MOVED.
It just wasn't clear from what I read in the patch

John
>>
>>> @@ -218,10 +222,29 @@ qemuDomainChangeEjectableMedia(virQEMUDriverPtr driver,
>>>      virObjectUnref(vm);
>>>  
>>>      if (retries <= 0) {
>>> -        virReportError(VIR_ERR_OPERATION_FAILED, "%s",
>>> -                       _("Unable to eject media"));
>>> -        ret = -1;
>>> +        /* Report a new error only if the first attempt don't fail and we don't
>>> +         * receive VIR_DOMAIN_DISK_TRAY_OPEN event, otherwise report the error
>>> +         * from first attempt. */
>>> +        if (ret == 0) {
>>> +            virReportError(VIR_ERR_OPERATION_FAILED, "%s",
>>> +                           _("Unable to eject media"));
>>> +            ret = -1;
>>> +        }
>>>          goto error;
>>> +    } else {
>>> +        /* QEMU will send eject request to guest, but if the tray is locked,
>>> +         * always returns with error.  However the guest can handle the eject
>>> +         * request, unlock the tray and open it.  In case this happens, we
>>> +         * should try to eject the media once more. */
>>> +        qemuDomainObjEnterMonitor(driver, vm);
>>> +        ret = qemuMonitorEjectMedia(priv->mon, driveAlias, force);
>>> +        if (qemuDomainObjExitMonitor(driver, vm) < 0) {
>>> +            ret = -1;
>>> +            goto cleanup;
>>> +        }
>>> +
>>> +        if (ret < 0)
>>> +            goto error;
>>>      }
>>>  
>>>      if (!virStorageSourceIsEmpty(newsrc)) {
>>>