[libvirt] [PATCH] qemu: hotplug: fail when net device detach times out

Tue Mar 13 18:20:51 UTC 2018

On Tue, Mar 13, 2018 at 06:13:46PM +0000, Jonathan Davies wrote:
> On 13/03/18 14:23, Daniel P. Berrangé wrote:
> > On Tue, Mar 13, 2018 at 02:07:08PM +0000, Jonathan Davies wrote:
> > > qemuDomainDetachNetDevice should not treat a timeout in a call to
> > > qemuDomainWaitForDeviceRemoval as success. Instead, it should treat it
> > > as failure -- this is the intention behind having a timeout.
> > 
> > Actually this is intentional current behaviour. See the API docs for
> > virDomainDetachDeviceFlags:
> > 
> >   * Beware that depending on the hypervisor and device type, detaching a device
> >   * from a running domain may be asynchronous. That is, calling
> >   * virDomainDetachDeviceFlags may just request device removal while the device
> >   * is actually removed later (in cooperation with a guest OS). Previously,
> >   * this fact was ignored and the device could have been removed from domain
> >   * configuration before it was actually removed by the hypervisor causing
> >   * various failures on subsequent operations. To check whether the device was
> >   * successfully removed, either recheck domain configuration using
> >   * virDomainGetXMLDesc() or add a handler for the VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED
> >   * event. In case the device is already gone when virDomainDetachDeviceFlags
> >   * returns, the event is delivered before this API call ends. To help existing
> >   * clients work better in most cases, this API will try to transform an
> >   * asynchronous device removal that finishes shortly after the request into
> >   * a synchronous removal. In other words, this API may wait a bit for the
> >   * removal to complete in case it was not synchronous.
> 
> Thanks all the helpful replies.
> 
> Just to check I've understood: The design is that the call is synchronous
> but may become asynchronous if it times out. The purpose of the timeout is
> just to limit the window in which it might be synchronous.

I'd say the opposite - the API is designed to be asynchronous, but you
might get lucky and have it complete before it returns.

> But there's no way for the client to know whether it was synchronous or
> asynchronous since the return value is the same in both cases.
> 
> Therefore a well-written client would always need to check whether the
> operation completed successfully or not.
> 
> In which case, why bother with the synchronous mode and the server-side
> timeout? Wouldn't it be simpler for it to be always asynchronous, allowing
> the client to wait for as long as it likes before giving up?

Yes Well written applications should always wait and check for actual
removal. Adding the server side wait was just a slightly gross
hack to make it slightly more reliable with badly written apps that
blindly assume completion

> Please correct me if I've misunderstood!

You're right, and yes this is an unpleasnt API design. If we could go
back and do it again we'd do things differently, but we're fairly stuck
with it now.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|