[libvirt] [PATCH v2] qemu: Do not reattach PCI device used by other domain when shutdown
Osier Yang
jyang at redhat.com
Mon Oct 17 01:40:24 UTC 2011
于 2011年10月15日 02:53, Eric Blake 写道:
> On 10/12/2011 10:05 PM, Osier Yang wrote:
>> When failing on starting a domain, it tries to reattach all the PCI
>> devices defined in the domain conf, regardless of whether the devices
>> are still used by other domain. This will cause the devices to be
>> deleted
>> from the list qemu_driver->activePciHostdevs, thus the devices will be
>> thought as usable even if it's not true. And following commands
>> nodedev-{reattach,reset} will be successful.
>>
>> How to reproduce:
>> 1) Define two domains with same PCI device defined in the confs.
>> 2) # virsh start domain1
>> 3) # virsh start domain2
>> 4) # virsh nodedev-reattach $pci_device
>
> This time around, I actually spent time reproducing the bug scenario
> on hardware rather than just analyzing by inspection (it took me a
> while to figure out why the same machine that used to let me do pci
> passthrough was no longer working, until I realized that my hardware
> is old enough to be insecure, and I've upgraded software since my last
> passthrough test, so the CVE fix from
> https://bugzilla.redhat.com/show_bug.cgi?id=715555 has to be
> intentionally bypassed for me to get passthrough again). With that
> testing, I proved that this patch prevents that problem. But, as
> written, your patch caused the dreaded "An error occurred, but the
> cause is unknown".
>
>>
>> You will see the device will be reattached to host successfully.
>> As pciDeviceReattach just check if the device is still used by
>> other domain via checking if the device is in list
>> driver->activePciHostdevs,
>> however, the device is deleted from the list by step 2).
>>
>> This patch is to prohibit the bug by:
>> 1) Prohibit a domain starting or device attachment right at
>> preparation period (qemuPrepareHostdevPCIDevices) if the
>> device is in list driver->activePciHostdevs, which means
>> it's used by other domain.
>>
>> 2) Introduces a new field for struct _pciDevice, (const char
>> *used_by),
>> it will be set as the domain name at preparation period,
>> (qemuPrepareHostdevPCIDevices). Thus we can prohibit deleting
>> the device from driver->activePciHostdevs if it's still used by
>> other domain when stopping the domain process.
>>
>
>> @@ -111,7 +112,7 @@ int qemuPrepareHostdevPCIDevices(struct
>> qemud_driver *driver,
>> if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs)))
>> return -1;
>>
>> - /* We have to use 4 loops here. *All* devices must
>> + /* We have to use 6 loops here. *All* devices must
>> * be detached before we reset any of them, because
>> * in some cases you have to reset the whole PCI,
>> * which impacts all devices on it. Also, all devices
>
> I added further loop labels.
>
>> + /* The device is in use by other active domain if
>> + * the dev is in list driver->activePciHostdevs.
>> + */
>> + if (!pciDeviceIsAssignable(dev, !driver->relaxedACS) ||
>> + pciDeviceListFind(driver->activePciHostdevs, dev))
>> + goto cleanup;
>> + }
>
> Here's where the bad message comes in - you jump to cleanup without
> ever emitting an error message. Not to mention now that we have the
> new used_by field, the message could actually be informative :) With
> my changes below, I get the much nicer:
>
> Error starting domain: Requested operation is not valid: PCI device
> 0000:0a:0a.0 is in use by domain dom
Oh, right, thanks for the fixing, :)
>
>> @@ -277,6 +298,18 @@ void qemuDomainReAttachHostdevDevices(struct
>> qemud_driver *driver,
>>
>> for (i = 0; i< pciDeviceListCount(pcidevs); i++) {
>> pciDevice *dev = pciDeviceListGet(pcidevs, i);
>> + pciDevice *activeDev = NULL;
>> + const char *used_by = NULL;
>> +
>> + /* Never delete the dev from list driver->activePciHostdevs
>> + * if it's used by other domain.
>> + */
>> + activeDev = pciDeviceListFind(driver->activePciHostdevs, dev);
>> + if (activeDev&&
>> + (used_by = pciDeviceGetUsedBy(activeDev))&&
>> + STRNEQ(name, used_by))
>> + continue;
>
>>
>> used_by is kept as it might be NULL.
>>
>
> In that case, use STRNEQ_NULLABLE. But you are right - across
> libvirtd restarts, we lose track of which domain owns the device - so
> I did indeed reproduce a case of NULL. Perhaps food for a later patch
> (when reloading domains, cycle through all hostdevs in use by active
> domains to repopulate the used_by fields). But not a show-stopper to
> getting this bug fix in now.
Yes, I didn't realize this, we need further patch.
>
> Well, I did find one other problem with a libvirtd restart - as soon
> as used_by is NULL, then stopping the domain that was actually using
> the device no longer frees that device out of the driver->active list
> (since we now only free if used_by matches) - so maybe it's important
> to fix libvirtd reload repopulating used_by sooner rather than later.
>
>> @@ -1312,6 +1313,7 @@ pciGetDevice(unsigned domain,
>> dev->bus = bus;
>> dev->slot = slot;
>> dev->function = function;
>> + dev->used_by = NULL;
>
> Pointless, since VIR_ALLOC 0-initializes.
>
>>
>> if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x",
>> dev->domain, dev->bus, dev->slot,
>> @@ -1374,6 +1376,7 @@ pciFreeDevice(pciDevice *dev)
>> VIR_DEBUG("%s %s: freeing", dev->id, dev->name);
>> pciCloseConfig(dev);
>> VIR_FREE(dev->path);
>> + dev->used_by = NULL;
>> VIR_FREE(dev);
>
> Pointless, since dev is freed in the next statement.
>
> ACK with this squashed in, so I went ahead and pushed.
>
> diff --git i/src/libvirt_private.syms w/src/libvirt_private.syms
> index 4673332..1e03e33 100644
> --- i/src/libvirt_private.syms
> +++ w/src/libvirt_private.syms
> @@ -881,6 +881,7 @@ virNWFilterHashTableRemoveEntry;
> pciDettachDevice;
> pciDeviceFileIterate;
> pciDeviceGetManaged;
> +pciDeviceGetName;
> pciDeviceGetUsedBy;
> pciDeviceIsAssignable;
> pciDeviceIsVirtualFunction;
> diff --git i/src/qemu/qemu_hostdev.c w/src/qemu/qemu_hostdev.c
> index f354ea8..c82c512 100644
> --- i/src/qemu/qemu_hostdev.c
> +++ w/src/qemu/qemu_hostdev.c
> @@ -1,7 +1,7 @@
> /*
> * qemu_hostdev.c: QEMU hostdev management
> *
> - * Copyright (C) 2006-2007, 2009-2010 Red Hat, Inc.
> + * Copyright (C) 2006-2007, 2009-2011 Red Hat, Inc.
> * Copyright (C) 2006 Daniel P. Berrange
> *
> * This library is free software; you can redistribute it and/or
> @@ -119,21 +119,40 @@ int qemuPrepareHostdevPCIDevices(struct
> qemud_driver *driver,
> * must be reset before being marked as active.
> */
>
> - /* XXX validate that non-managed device isn't in use, eg
> + /* Loop 1: validate that non-managed device isn't in use, eg
> * by checking that device is either un-bound, or bound
> * to pci-stub.ko
> */
>
> for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
> pciDevice *dev = pciDeviceListGet(pcidevs, i);
> + pciDevice *other;
> +
> + if (!pciDeviceIsAssignable(dev, !driver->relaxedACS)) {
> + qemuReportError(VIR_ERR_OPERATION_INVALID,
> + _("PCI device %s is not assignable"),
> + pciDeviceGetName(dev));
> + goto cleanup;
> + }
> /* The device is in use by other active domain if
> * the dev is in list driver->activePciHostdevs.
> */
> - if (!pciDeviceIsAssignable(dev, !driver->relaxedACS) ||
> - pciDeviceListFind(driver->activePciHostdevs, dev))
> + if ((other = pciDeviceListFind(driver->activePciHostdevs,
> dev))) {
> + const char *other_name = pciDeviceGetUsedBy(other);
> +
> + if (other_name)
> + qemuReportError(VIR_ERR_OPERATION_INVALID,
> + _("PCI device %s is in use by domain
> %s"),
> + pciDeviceGetName(dev), other_name);
> + else
> + qemuReportError(VIR_ERR_OPERATION_INVALID,
> + _("PCI device %s is already in use"),
> + pciDeviceGetName(dev));
> goto cleanup;
> + }
> }
>
> + /* Loop 2: detach managed devices */
> for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
> pciDevice *dev = pciDeviceListGet(pcidevs, i);
> if (pciDeviceGetManaged(dev) &&
> @@ -141,15 +160,15 @@ int qemuPrepareHostdevPCIDevices(struct
> qemud_driver *driver,
> goto reattachdevs;
> }
>
> - /* Now that all the PCI hostdevs have be dettached, we can safely
> - * reset them */
> + /* Loop 3: Now that all the PCI hostdevs have been detached, we
> + * can safely reset them */
> for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
> pciDevice *dev = pciDeviceListGet(pcidevs, i);
> if (pciResetDevice(dev, driver->activePciHostdevs, pcidevs) < 0)
> goto reattachdevs;
> }
>
> - /* Now mark all the devices as active */
> + /* Loop 4: Now mark all the devices as active */
> for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
> pciDevice *dev = pciDeviceListGet(pcidevs, i);
> if (pciDeviceListAdd(driver->activePciHostdevs, dev) < 0) {
> @@ -158,8 +177,8 @@ int qemuPrepareHostdevPCIDevices(struct
> qemud_driver *driver,
> }
> }
>
> - /* Now set the used_by_domain of the device in
> driver->activePciHostdevs
> - * as domain name.
> + /* Loop 5: Now set the used_by_domain of the device in
> + * driver->activePciHostdevs as domain name.
> */
> for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
> pciDevice *dev, *activeDev;
> @@ -170,7 +189,7 @@ int qemuPrepareHostdevPCIDevices(struct
> qemud_driver *driver,
> pciDeviceSetUsedBy(activeDev, name);
> }
>
> - /* Now steal all the devices from pcidevs */
> + /* Loop 6: Now steal all the devices from pcidevs */
> while (pciDeviceListCount(pcidevs) > 0) {
> pciDevice *dev = pciDeviceListGet(pcidevs, 0);
> pciDeviceListSteal(pcidevs, dev);
> @@ -299,15 +318,13 @@ void qemuDomainReAttachHostdevDevices(struct
> qemud_driver *driver,
> for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
> pciDevice *dev = pciDeviceListGet(pcidevs, i);
> pciDevice *activeDev = NULL;
> - const char *used_by = NULL;
>
> /* Never delete the dev from list driver->activePciHostdevs
> - * if it's used by other domain.
> + * if it's used by other domain.
> */
> activeDev = pciDeviceListFind(driver->activePciHostdevs, dev);
> if (activeDev &&
> - (used_by = pciDeviceGetUsedBy(activeDev)) &&
> - STRNEQ(name, used_by))
> + STRNEQ_NULLABLE(name, pciDeviceGetUsedBy(activeDev)))
> continue;
>
> pciDeviceListDel(driver->activePciHostdevs, dev);
> diff --git i/src/util/pci.c w/src/util/pci.c
> index 888784a..2bbb90c 100644
> --- i/src/util/pci.c
> +++ w/src/util/pci.c
> @@ -1313,7 +1313,6 @@ pciGetDevice(unsigned domain,
> dev->bus = bus;
> dev->slot = slot;
> dev->function = function;
> - dev->used_by = NULL;
>
> if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x",
> dev->domain, dev->bus, dev->slot,
> @@ -1376,10 +1375,15 @@ pciFreeDevice(pciDevice *dev)
> VIR_DEBUG("%s %s: freeing", dev->id, dev->name);
> pciCloseConfig(dev);
> VIR_FREE(dev->path);
> - dev->used_by = NULL;
> VIR_FREE(dev);
> }
>
> +const char *
> +pciDeviceGetName(pciDevice *dev)
> +{
> + return dev->name;
> +}
> +
> void pciDeviceSetManaged(pciDevice *dev, unsigned managed)
> {
> dev->managed = !!managed;
> diff --git i/src/util/pci.h w/src/util/pci.h
> index 640c6af..ab29c0b 100644
> --- i/src/util/pci.h
> +++ w/src/util/pci.h
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (C) 2009 Red Hat, Inc.
> + * Copyright (C) 2009, 2011 Red Hat, Inc.
> *
> * This library is free software; you can redistribute it and/or
> * modify it under the terms of the GNU Lesser General Public
> @@ -39,6 +39,7 @@ pciDevice *pciGetDevice (unsigned domain,
> unsigned slot,
> unsigned function);
> void pciFreeDevice (pciDevice *dev);
> +const char *pciDeviceGetName (pciDevice *dev);
> int pciDettachDevice (pciDevice *dev, pciDeviceList
> *activeDevs);
> int pciReAttachDevice (pciDevice *dev, pciDeviceList
> *activeDevs);
> int pciResetDevice (pciDevice *dev,
>
>
More information about the libvir-list
mailing list