[libvirt] [PATCH v2] qemu: Do not reattach PCI device used by other domain when shutdown

Osier Yang jyang at redhat.com
Mon Oct 17 01:40:24 UTC 2011


于 2011年10月15日 02:53, Eric Blake 写道:
> On 10/12/2011 10:05 PM, Osier Yang wrote:
>> When failing on starting a domain, it tries to reattach all the PCI
>> devices defined in the domain conf, regardless of whether the devices
>> are still used by other domain. This will cause the devices to be 
>> deleted
>> from the list qemu_driver->activePciHostdevs, thus the devices will be
>> thought as usable even if it's not true. And following commands
>> nodedev-{reattach,reset} will be successful.
>>
>> How to reproduce:
>>    1) Define two domains with same PCI device defined in the confs.
>>    2) # virsh start domain1
>>    3) # virsh start domain2
>>    4) # virsh nodedev-reattach $pci_device
>
> This time around, I actually spent time reproducing the bug scenario 
> on hardware rather than just analyzing by inspection (it took me a 
> while to figure out why the same machine that used to let me do pci 
> passthrough was no longer working, until I realized that my hardware 
> is old enough to be insecure, and I've upgraded software since my last 
> passthrough test, so the CVE fix from 
> https://bugzilla.redhat.com/show_bug.cgi?id=715555 has to be 
> intentionally bypassed for me to get passthrough again).  With that 
> testing, I proved that this patch prevents that problem.  But, as 
> written, your patch caused the dreaded "An error occurred, but the 
> cause is unknown".
>
>>
>> You will see the device will be reattached to host successfully.
>> As pciDeviceReattach just check if the device is still used by
>> other domain via checking if the device is in list 
>> driver->activePciHostdevs,
>> however, the device is deleted from the list by step 2).
>>
>> This patch is to prohibit the bug by:
>>    1) Prohibit a domain starting or device attachment right at
>>       preparation period (qemuPrepareHostdevPCIDevices) if the
>>       device is in list driver->activePciHostdevs, which means
>>       it's used by other domain.
>>
>>    2) Introduces a new field for struct _pciDevice, (const char 
>> *used_by),
>>       it will be set as the domain name at preparation period,
>>       (qemuPrepareHostdevPCIDevices). Thus we can prohibit deleting
>>       the device from driver->activePciHostdevs if it's still used by
>>       other domain when stopping the domain process.
>>
>
>> @@ -111,7 +112,7 @@ int qemuPrepareHostdevPCIDevices(struct 
>> qemud_driver *driver,
>>       if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs)))
>>           return -1;
>>
>> -    /* We have to use 4 loops here. *All* devices must
>> +    /* We have to use 6 loops here. *All* devices must
>>        * be detached before we reset any of them, because
>>        * in some cases you have to reset the whole PCI,
>>        * which impacts all devices on it. Also, all devices
>
> I added further loop labels.
>
>> +        /* The device is in use by other active domain if
>> +         * the dev is in list driver->activePciHostdevs.
>> +         */
>> +        if (!pciDeviceIsAssignable(dev, !driver->relaxedACS) ||
>> +            pciDeviceListFind(driver->activePciHostdevs, dev))
>> +            goto cleanup;
>> +    }
>
> Here's where the bad message comes in - you jump to cleanup without 
> ever emitting an error message.  Not to mention now that we have the 
> new used_by field, the message could actually be informative :)  With 
> my changes below, I get the much nicer:
>
> Error starting domain: Requested operation is not valid: PCI device 
> 0000:0a:0a.0 is in use by domain dom

Oh, right, thanks for the fixing, :)

>
>> @@ -277,6 +298,18 @@ void qemuDomainReAttachHostdevDevices(struct 
>> qemud_driver *driver,
>>
>>       for (i = 0; i<  pciDeviceListCount(pcidevs); i++) {
>>           pciDevice *dev = pciDeviceListGet(pcidevs, i);
>> +        pciDevice *activeDev = NULL;
>> +        const char *used_by = NULL;
>> +
>> +        /* Never delete the dev from list driver->activePciHostdevs
>> +         *  if it's used by other domain.
>> +         */
>> +        activeDev = pciDeviceListFind(driver->activePciHostdevs, dev);
>> +        if (activeDev&&
>> +            (used_by = pciDeviceGetUsedBy(activeDev))&&
>> +            STRNEQ(name, used_by))
>> +            continue;
>
>>
>> used_by is kept as it might be NULL.
>>
>
> In that case, use STRNEQ_NULLABLE.  But you are right - across 
> libvirtd restarts, we lose track of which domain owns the device - so 
> I did indeed reproduce a case of NULL.  Perhaps food for a later patch 
> (when reloading domains, cycle through all hostdevs in use by active 
> domains to repopulate the used_by fields).  But not a show-stopper to 
> getting this bug fix in now.

Yes, I didn't realize this, we need further patch.

>
> Well, I did find one other problem with a libvirtd restart - as soon 
> as used_by is NULL, then stopping the domain that was actually using 
> the device no longer frees that device out of the driver->active list 
> (since we now only free if used_by matches) - so maybe it's important 
> to fix libvirtd reload repopulating used_by sooner rather than later.
>
>> @@ -1312,6 +1313,7 @@ pciGetDevice(unsigned domain,
>>       dev->bus      = bus;
>>       dev->slot     = slot;
>>       dev->function = function;
>> +    dev->used_by  = NULL;
>
> Pointless, since VIR_ALLOC 0-initializes.
>
>>
>>       if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x",
>>                    dev->domain, dev->bus, dev->slot,
>> @@ -1374,6 +1376,7 @@ pciFreeDevice(pciDevice *dev)
>>       VIR_DEBUG("%s %s: freeing", dev->id, dev->name);
>>       pciCloseConfig(dev);
>>       VIR_FREE(dev->path);
>> +    dev->used_by = NULL;
>>       VIR_FREE(dev);
>
> Pointless, since dev is freed in the next statement.
>
> ACK with this squashed in, so I went ahead and pushed.
>
> diff --git i/src/libvirt_private.syms w/src/libvirt_private.syms
> index 4673332..1e03e33 100644
> --- i/src/libvirt_private.syms
> +++ w/src/libvirt_private.syms
> @@ -881,6 +881,7 @@ virNWFilterHashTableRemoveEntry;
>  pciDettachDevice;
>  pciDeviceFileIterate;
>  pciDeviceGetManaged;
> +pciDeviceGetName;
>  pciDeviceGetUsedBy;
>  pciDeviceIsAssignable;
>  pciDeviceIsVirtualFunction;
> diff --git i/src/qemu/qemu_hostdev.c w/src/qemu/qemu_hostdev.c
> index f354ea8..c82c512 100644
> --- i/src/qemu/qemu_hostdev.c
> +++ w/src/qemu/qemu_hostdev.c
> @@ -1,7 +1,7 @@
>  /*
>   * qemu_hostdev.c: QEMU hostdev management
>   *
> - * Copyright (C) 2006-2007, 2009-2010 Red Hat, Inc.
> + * Copyright (C) 2006-2007, 2009-2011 Red Hat, Inc.
>   * Copyright (C) 2006 Daniel P. Berrange
>   *
>   * This library is free software; you can redistribute it and/or
> @@ -119,21 +119,40 @@ int qemuPrepareHostdevPCIDevices(struct 
> qemud_driver *driver,
>       * must be reset before being marked as active.
>       */
>
> -    /* XXX validate that non-managed device isn't in use, eg
> +    /* Loop 1: validate that non-managed device isn't in use, eg
>       * by checking that device is either un-bound, or bound
>       * to pci-stub.ko
>       */
>
>      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
>          pciDevice *dev = pciDeviceListGet(pcidevs, i);
> +        pciDevice *other;
> +
> +        if (!pciDeviceIsAssignable(dev, !driver->relaxedACS)) {
> +            qemuReportError(VIR_ERR_OPERATION_INVALID,
> +                            _("PCI device %s is not assignable"),
> +                            pciDeviceGetName(dev));
> +            goto cleanup;
> +        }
>          /* The device is in use by other active domain if
>           * the dev is in list driver->activePciHostdevs.
>           */
> -        if (!pciDeviceIsAssignable(dev, !driver->relaxedACS) ||
> -            pciDeviceListFind(driver->activePciHostdevs, dev))
> +        if ((other = pciDeviceListFind(driver->activePciHostdevs, 
> dev))) {
> +            const char *other_name = pciDeviceGetUsedBy(other);
> +
> +            if (other_name)
> +                qemuReportError(VIR_ERR_OPERATION_INVALID,
> +                                _("PCI device %s is in use by domain 
> %s"),
> +                                pciDeviceGetName(dev), other_name);
> +            else
> +                qemuReportError(VIR_ERR_OPERATION_INVALID,
> +                                _("PCI device %s is already in use"),
> +                                pciDeviceGetName(dev));
>              goto cleanup;
> +        }
>      }
>
> +    /* Loop 2: detach managed devices */
>      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
>          pciDevice *dev = pciDeviceListGet(pcidevs, i);
>          if (pciDeviceGetManaged(dev) &&
> @@ -141,15 +160,15 @@ int qemuPrepareHostdevPCIDevices(struct 
> qemud_driver *driver,
>              goto reattachdevs;
>      }
>
> -    /* Now that all the PCI hostdevs have be dettached, we can safely
> -     * reset them */
> +    /* Loop 3: Now that all the PCI hostdevs have been detached, we
> +     * can safely reset them */
>      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
>          pciDevice *dev = pciDeviceListGet(pcidevs, i);
>          if (pciResetDevice(dev, driver->activePciHostdevs, pcidevs) < 0)
>              goto reattachdevs;
>      }
>
> -    /* Now mark all the devices as active */
> +    /* Loop 4: Now mark all the devices as active */
>      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
>          pciDevice *dev = pciDeviceListGet(pcidevs, i);
>          if (pciDeviceListAdd(driver->activePciHostdevs, dev) < 0) {
> @@ -158,8 +177,8 @@ int qemuPrepareHostdevPCIDevices(struct 
> qemud_driver *driver,
>          }
>      }
>
> -    /* Now set the used_by_domain of the device in 
> driver->activePciHostdevs
> -     * as domain name.
> +    /* Loop 5: Now set the used_by_domain of the device in
> +     * driver->activePciHostdevs as domain name.
>       */
>      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
>          pciDevice *dev, *activeDev;
> @@ -170,7 +189,7 @@ int qemuPrepareHostdevPCIDevices(struct 
> qemud_driver *driver,
>          pciDeviceSetUsedBy(activeDev, name);
>      }
>
> -    /* Now steal all the devices from pcidevs */
> +    /* Loop 6: Now steal all the devices from pcidevs */
>      while (pciDeviceListCount(pcidevs) > 0) {
>          pciDevice *dev = pciDeviceListGet(pcidevs, 0);
>          pciDeviceListSteal(pcidevs, dev);
> @@ -299,15 +318,13 @@ void qemuDomainReAttachHostdevDevices(struct 
> qemud_driver *driver,
>      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
>          pciDevice *dev = pciDeviceListGet(pcidevs, i);
>          pciDevice *activeDev = NULL;
> -        const char *used_by = NULL;
>
>          /* Never delete the dev from list driver->activePciHostdevs
> -         *  if it's used by other domain.
> +         * if it's used by other domain.
>           */
>          activeDev = pciDeviceListFind(driver->activePciHostdevs, dev);
>          if (activeDev &&
> -            (used_by = pciDeviceGetUsedBy(activeDev)) &&
> -            STRNEQ(name, used_by))
> +            STRNEQ_NULLABLE(name, pciDeviceGetUsedBy(activeDev)))
>              continue;
>
>          pciDeviceListDel(driver->activePciHostdevs, dev);
> diff --git i/src/util/pci.c w/src/util/pci.c
> index 888784a..2bbb90c 100644
> --- i/src/util/pci.c
> +++ w/src/util/pci.c
> @@ -1313,7 +1313,6 @@ pciGetDevice(unsigned domain,
>      dev->bus      = bus;
>      dev->slot     = slot;
>      dev->function = function;
> -    dev->used_by  = NULL;
>
>      if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x",
>                   dev->domain, dev->bus, dev->slot,
> @@ -1376,10 +1375,15 @@ pciFreeDevice(pciDevice *dev)
>      VIR_DEBUG("%s %s: freeing", dev->id, dev->name);
>      pciCloseConfig(dev);
>      VIR_FREE(dev->path);
> -    dev->used_by = NULL;
>      VIR_FREE(dev);
>  }
>
> +const char *
> +pciDeviceGetName(pciDevice *dev)
> +{
> +    return dev->name;
> +}
> +
>  void pciDeviceSetManaged(pciDevice *dev, unsigned managed)
>  {
>      dev->managed = !!managed;
> diff --git i/src/util/pci.h w/src/util/pci.h
> index 640c6af..ab29c0b 100644
> --- i/src/util/pci.h
> +++ w/src/util/pci.h
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (C) 2009 Red Hat, Inc.
> + * Copyright (C) 2009, 2011 Red Hat, Inc.
>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public
> @@ -39,6 +39,7 @@ pciDevice *pciGetDevice      (unsigned       domain,
>                                unsigned       slot,
>                                unsigned       function);
>  void       pciFreeDevice     (pciDevice     *dev);
> +const char *pciDeviceGetName (pciDevice     *dev);
>  int        pciDettachDevice  (pciDevice     *dev, pciDeviceList 
> *activeDevs);
>  int        pciReAttachDevice (pciDevice     *dev, pciDeviceList 
> *activeDevs);
>  int        pciResetDevice    (pciDevice     *dev,
>
>




More information about the libvir-list mailing list