[libvirt] [PATCH 5/5] qemu_hotplug: Fix a rare race condition when detaching a device twice

Peter Krempa pkrempa at redhat.com
Wed Mar 13 12:35:54 UTC 2019


On Wed, Mar 13, 2019 at 11:35:50 +0100, Michal Privoznik wrote:
> On 3/12/19 7:57 PM, Peter Krempa wrote:
> > On Tue, Mar 12, 2019 at 16:41:15 +0100, Peter Krempa wrote:
> > > On Tue, Mar 12, 2019 at 16:13:20 +0100, Michal Privoznik wrote:

[...]

> > > Additionally it would be better if qemuMonitorJSONDelDevice actually
> > > returns the error string or object without reporting it so that we can
> > > decide here not to report it at all rather than resetting it.
> > 
> > After some more thinking I think that:
> > 
> > 1) The new helper should encapsulate also the call to enter the monitor
> >      - that way you can avoid the super-hacky way that re-locks the
> >        monitor.
> >      - you can then intergrate setting of the alias into private data
> >      - resetting of it after exit of the monitor
> >      - checking that the event was seen
> >    This should work nicely as AFAIK all code paths using 'device_del'
> >    should only ever call this one API and do everything else.
> > 
> > 2) Refactor the monitor APIs for device_del so that they return the
> >     error message reported by the monitor as success and return the copy
> >     of the error message. This will probably also require a different
> >     monitor error checking function
> > 
> >     That way you can get the error message and report it if you decide
> >     so.
> 
> Alternatively, I can introduce an argument to qemuMonitorDelDevice() which
> would supress error reporting for this specific case (we still want to
> report errors from QEMU_CHECK_MONITOR(), qemuMonitorJSONMakeCommand(), ...),
> return say -2 so that my code knows device_del failed because the device is
> already missing.

Well, in this case you could check for the "DeviceNotFound" error class.
The thing is you still want to report the original error from qemu in
case when the event was not received on failure, so you'll have to pass
it up anyways to do the decision where 'priv' is available after you
exit the monitor.

Note that even when a GenericError or any other error was received you
still need to process the device deletion if it was hijacked by the
current thread.

> > If any code path requires more than one monitor call, the 1) will also
> > ensure that the alias which we are waiting for is set only during
> > controlled amount of time and thus we are sure that the event thread
> > handles any DEVICE_DELETED events otherwise.
> > 
> 
> I started writing it this way. But problem is the
> qemuDomainDetachExtensionDevice() which is called in some rollback

You certainly can do a separate wrapper for this. The extension device
will never be handled through the local processing, thus it's alias
should not be set in priv->unplug.alias, so it does not need this
handling.

We need to do this only for those where we do want to wait for the
detach event for local processing.

> codes. For instance in qemuDomainAttachDiskGeneric:
> 
>     qemuDomainObjEnterMonitor(driver, vm);
> 
>     if (qemuHotplugDiskSourceAttach(priv->mon, diskdata) < 0)
>         goto exit_monitor;
> 
>     if (qemuDomainAttachExtensionDevice(priv->mon, &disk->info) < 0)
>         goto exit_monitor;
> 
>     if (qemuMonitorAddDevice(priv->mon, devstr) < 0) {
>         ignore_value(qemuDomainDetachExtensionDevice(priv->mon,
> &disk->info));
>         goto exit_monitor;
>     }
> 
>     if (qemuDomainObjExitMonitor(driver, vm) < 0) {
> 
> But I guess I can do what I'm doing now - if some argument is NULL then
> I can assume caller has done EnterMonitor() and not call it again.
> 
> Michal
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20190313/06bce8ec/attachment-0001.sig>


More information about the libvir-list mailing list