[libvirt] [PATCH 5/5] qemu_hotplug: Fix a rare race condition when detaching a device twice
Michal Privoznik
mprivozn at redhat.com
Wed Mar 13 10:35:50 UTC 2019
On 3/12/19 7:57 PM, Peter Krempa wrote:
> On Tue, Mar 12, 2019 at 16:41:15 +0100, Peter Krempa wrote:
>> On Tue, Mar 12, 2019 at 16:13:20 +0100, Michal Privoznik wrote:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1623389
>>>
>>> If a device is detached twice from the same domain the following
>>> race condition may happen:
>>>
>>> 1) The first DetachDevice() call will issue "device_del" on qemu
>>> monitor, but since the DEVICE_DELETED event did not arrive in
>>> time, the API ends claiming "Device detach request sent
>>> successfully".
>>>
>>> 2) The second DetachDevice() therefore still find the device in
>>> the domain and thus proceeds to detaching it again. It calls
>>> EnterMonitor() and qemuMonitorSend() trying to issue "device_del"
>>> command again. This gets both domain lock and monitor lock
>>> released.
>>>
>>> 3) At this point, qemu sends us the DEVICE_DELETED event which is
>>> going to be handled by the event loop which ends up calling
>>> qemuDomainSignalDeviceRemoval() to determine who is going to
>>> remove the device from domain definition. Whether it is the
>>> caller that marked the device for removal or whether it is going
>>> to be the event processing thread.
>>>
>>> 4) Because the device was marked for removal,
>>> qemuDomainSignalDeviceRemoval() returns true, which means the
>>> event is to be processed by the thread that has marked the device
>>> for removal (and is currently still trying to issue "device_del"
>>> command)
>>>
>>> 5) The thread finally issues the "device_del" command, which
>>> fails (obviously) and therefore it calls
>>> qemuDomainResetDeviceRemoval() to reset the device marking and
>>> quits immediately after, NOT removing any device from the domain
>>> definition.
>>>
>>> At this point, the device is still present in the domain
>>> definition but doesn't exist in qemu anymore. Worse, there is no
>>> way to remove it from the domain definition.
>>>
>>> Solution is to note down that we've seen the event and if the
>>> second "device_del" fails, not take it as a failure but carry on
>>> with the usual execution.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>>> ---
>>> src/qemu/qemu_domain.h | 1 +
>>> src/qemu/qemu_hotplug.c | 83 +++++++++++++++++++++++++++++------------
>>> 2 files changed, 60 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
>>> index 9f468e5661..fb361515ba 100644
>>> --- a/src/qemu/qemu_domain.h
>>> +++ b/src/qemu/qemu_domain.h
>>> @@ -218,6 +218,7 @@ typedef qemuDomainUnpluggingDevice *qemuDomainUnpluggingDevicePtr;
>>> struct _qemuDomainUnpluggingDevice {
>>> const char *alias;
>>> qemuDomainUnpluggingDeviceStatus status;
>>> + bool eventSeen; /* True if DEVICE_DELETED event arrived. */
>>> };
>>>
>>>
>>> diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
>>> index 574477e916..93c0e14adf 100644
>>> --- a/src/qemu/qemu_hotplug.c
>>> +++ b/src/qemu/qemu_hotplug.c
>>> @@ -70,22 +70,47 @@ unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5;
>>> /**
>>> * qemuDomainDeleteDevice:
>>> * @mon: qemu monitor
>>> + * @vm: domain object
>>> * @alias: device to remove
>>> *
>>> * A simple wrapper around qemuMonitorDelDevice().
>>> - * @mon must be locked upon entry.
>>> + * @mon must be locked upon entry, @vm shan't.
>>
>> Please use either the more contemporary short form or the long form.
>>
>>> *
>>> * Returns: 0 on success,
>>> * -1 otherwise.
>>> */
>>> static inline int
>>> qemuDomainDeleteDevice(qemuMonitorPtr mon,
>>> + virDomainObjPtr vm,
>>> const char *alias)
>>> {
>>> - if (qemuMonitorDelDevice(mon, alias) < 0)
>>> - return -1;
>>> + qemuDomainObjPrivatePtr priv;
>>> + int ret = 0;
>>>
>>> - return 0;
>>> + if (qemuMonitorDelDevice(mon, alias) < 0) {
>>> + if (vm) {
>>> + /* It is safe to lock and unlock both @mon and @vm
>>> + * here because:
>>> + * a) qemuDomainObjEnterMonitor() ensures @mon is
>>> + * ref()'d
>>> + * b) The API that is calling us ensures that @vm is
>>> + * ref()'d
>>
>> Don't break the two lines.
>>
>>> + */
>>> + virObjectUnlock(mon);
>>> + virObjectLock(vm);
>>> + priv = vm->privateData;
>>> + if (priv->unplug.eventSeen)
>>
>> You can't do this without also clearing priv->unplug.alias. If the event
>> was not seen at this point you need to make sure that it will
>> unconditionally be handled via the separate event handler thread prior
>> to giving up the lock on VM. By not clearing it, you still have another
>> chance at the event handler thinking the unplug will be handled here
>> without actually doing so.
>>
>>> + virResetLastError();
>>
>> Additionally it would be better if qemuMonitorJSONDelDevice actually
>> returns the error string or object without reporting it so that we can
>> decide here not to report it at all rather than resetting it.
>
> After some more thinking I think that:
>
> 1) The new helper should encapsulate also the call to enter the monitor
> - that way you can avoid the super-hacky way that re-locks the
> monitor.
> - you can then intergrate setting of the alias into private data
> - resetting of it after exit of the monitor
> - checking that the event was seen
> This should work nicely as AFAIK all code paths using 'device_del'
> should only ever call this one API and do everything else.
>
> 2) Refactor the monitor APIs for device_del so that they return the
> error message reported by the monitor as success and return the copy
> of the error message. This will probably also require a different
> monitor error checking function
>
> That way you can get the error message and report it if you decide
> so.
Alternatively, I can introduce an argument to qemuMonitorDelDevice()
which would supress error reporting for this specific case (we still
want to report errors from QEMU_CHECK_MONITOR(),
qemuMonitorJSONMakeCommand(), ...), return say -2 so that my code knows
device_del failed because the device is already missing.
>
>
> If any code path requires more than one monitor call, the 1) will also
> ensure that the alias which we are waiting for is set only during
> controlled amount of time and thus we are sure that the event thread
> handles any DEVICE_DELETED events otherwise.
>
I started writing it this way. But problem is the
qemuDomainDetachExtensionDevice() which is called in some rollback
codes. For instance in qemuDomainAttachDiskGeneric:
qemuDomainObjEnterMonitor(driver, vm);
if (qemuHotplugDiskSourceAttach(priv->mon, diskdata) < 0)
goto exit_monitor;
if (qemuDomainAttachExtensionDevice(priv->mon, &disk->info) < 0)
goto exit_monitor;
if (qemuMonitorAddDevice(priv->mon, devstr) < 0) {
ignore_value(qemuDomainDetachExtensionDevice(priv->mon,
&disk->info));
goto exit_monitor;
}
if (qemuDomainObjExitMonitor(driver, vm) < 0) {
But I guess I can do what I'm doing now - if some argument is NULL then
I can assume caller has done EnterMonitor() and not call it again.
Michal
More information about the libvir-list
mailing list