[libvirt] [PATCH v2 13/14] qemu_hotplug: delay sending DEVICE_REMOVED event until after *all* teardown

Peter Krempa pkrempa at redhat.com
Tue Mar 26 12:15:33 UTC 2019


On Mon, Mar 25, 2019 at 13:24:35 -0400, Laine Stump wrote:
> The VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED event is sent after qemu has
> responded to a device_del command with a DEVICE_DELETED event. Before
> queuing the event, *some* of the final teardown of the device's
> trappings in libvirt is done, but not *all* of it. As a result, an
> application may receive and process the DEVICE_REMOVED event before
> libvirt has really finished with it.
> 
> Usually this doesn't cause a problem, but it can - in the case of the
> bug report referenced below, vdsm is assigning a PCI device to a guest
> with managed='no', using livirt's virNodeDeviceDetachFlags() and
> virNodeDeviceReAttach() APIs. Immediately after receiving a
> DEVICE_REMOVED event from libvirt signalling that the device had been
> successfully unplugged, vdsm would cal virNodeDeviceReAttach() to
> unbind the device from vfio-pci and rebind it to the host driverm but
> because the event was received before libvirt had completely finished
> processing the removal, that device was still on the "activeDevs"
> list, and so virNodeDeviceReAttach() failed.
> 
> Experimentation with additional debug logs proved that libvirt would
> always end up dispatching the DEVICE_REMOVED event before it had
> removed the device from activeDevs (with a *much* greater difference
> with managed='yes', since in that case the re-binding of the device
> occurred after queuing the device).
> 
> Although the case of hostdev devices is the most extreme (since there
> is so much involved in tearing down the device), *all* device types
> suffer from the same problem - the DEVICE_REMOVED event is queued very
> early in the qemuDomainRemove*Device() function for all of them,
> resulting in a possibility of any application receiving the event
> before libvirt has really finished with the device.
> 
> The solution is to save the device's alias (which is the only piece of
> info from the device object that is needed for the event) at the
> beginning of processing the device removal, and then queue the event
> as a final act before returning. Since all of the
> qemuDomainRemove*Device() functions (except
> qemuDomainRemoveChrDevice()) are now called exclusively from
> qemuDomainRemoveDevice() (which selects which of the subordinates to
> call in a switch statement based on the type of device), the shortest
> route to a solution is to doing the saving of alias, and later
> queueing of the event, in the higher level qemuDomainRemoveDevice(),
> and just completely remove the event-related code from all the
> subordinate functions.
> 
> The single exception to this, as mentioned before, is
> qemuDomainRemoveChrDevice(), which is still called from somewhere
> other than qemuDomainRemoveDevice() (and has a separate arg used to
> trigger different behavior when the chr device has targetType ==
> GUESTFWD), so it must keep its original behavior intact, and must be
> treated differently by qemuDomainRemoveDevice() (similar to the way
> that qemuDomainDetachDeviceLive() treats chr and lease devices
> differently from all the others).
> 
> Resolves: https://bugzilla.redhat.com/1658198
> 
> Signed-off-by: Laine Stump <laine at laine.org>
> ---
> 
> Change from V1:
> 
> * reworded some comments based on review
> 
> * don't error out if the device has no DeviceInfo, instead just don't
>   sent the DEVICE_REMOVED event.
> 
> * Set DeviceInfoPtr to NULL after we've retrieved the alias from it.
> 
>  src/qemu/qemu_hotplug.c | 145 +++++++++++++++++++---------------------
>  1 file changed, 69 insertions(+), 76 deletions(-)
> 
[...]

> @@ -4948,11 +4927,20 @@ qemuDomainRemoveChrDevice(virQEMUDriverPtr driver,
>      if (qemuDomainNamespaceTeardownChardev(vm, chr) < 0)
>          VIR_WARN("Unable to remove chr device from /dev");
>  
> +    qemuDomainReleaseDeviceAddress(vm, &chr->info, NULL);
> +    qemuDomainChrRemove(vm->def, chr);
> +
> +    /* NB: all other qemuDomainRemove*Device() functions omit the
> +     * sending of the DEVICE_REMOVED event, because they are *only*
> +     * called from qemuDomainRemoveDevice(), and that function sends
> +     * the DEVICE_REMOVED event for them, this function is different -
> +     * it is called from other places than just
> +     * qemuDomainRemoveDevice(), so it must send the DEVICE_REMOVED
> +     * event itself.

I think this should only say that this function needs to report the
event itself. Also mention that it has to happen after all the backend
stuff is detached.

/* The caller does not emit the event. Note that the event should be
 * reported only after all backend stuff is gone
 */

> +     */
>      event = virDomainEventDeviceRemovedNewFromObj(vm, chr->info.alias);
>      virObjectEventStateQueue(driver->domainEventState, event);
>  
> -    qemuDomainReleaseDeviceAddress(vm, &chr->info, NULL);
> -    qemuDomainChrRemove(vm->def, chr);
>      virDomainChrDefFree(chr);
>      ret = 0;

[...]

> @@ -5277,50 +5237,79 @@ qemuDomainRemoveDevice(virQEMUDriverPtr driver,
>                         virDomainObjPtr vm,
>                         virDomainDeviceDefPtr dev)
>  {
> -    int ret = -1;
> +    virDomainDeviceInfoPtr info;
> +    virObjectEventPtr event;
> +    VIR_AUTOFREE(char *)alias = NULL;

Missing space before 'alias'

> +
> +    /*
> +     * save the alias to use when sending a DEVICE_REMOVED event after
> +     * all other teardown is complete
> +     */
> +    if ((info = virDomainDeviceGetInfo(dev))
> +        && VIR_STRDUP(alias, info->alias) < 0) {

&& is usually at the end of the previous line.

> +        return -1;
> +    }
> +    info = NULL; /* to prevent accidental use later */

// this is bridge [1]

> +
>      switch ((virDomainDeviceType)dev->type) {

ACK with the above addressed

[1] https://imgur.com/gallery/dZe8k
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20190326/377804b7/attachment-0001.sig>


More information about the libvir-list mailing list