[libvirt] [PATCH v2] qemu: Do not reattach PCI device used by other domain when shutdown

Eric Blake eblake at redhat.com
Fri Oct 14 18:53:54 UTC 2011


On 10/12/2011 10:05 PM, Osier Yang wrote:
> When failing on starting a domain, it tries to reattach all the PCI
> devices defined in the domain conf, regardless of whether the devices
> are still used by other domain. This will cause the devices to be deleted
> from the list qemu_driver->activePciHostdevs, thus the devices will be
> thought as usable even if it's not true. And following commands
> nodedev-{reattach,reset} will be successful.
>
> How to reproduce:
>    1) Define two domains with same PCI device defined in the confs.
>    2) # virsh start domain1
>    3) # virsh start domain2
>    4) # virsh nodedev-reattach $pci_device

This time around, I actually spent time reproducing the bug scenario on 
hardware rather than just analyzing by inspection (it took me a while to 
figure out why the same machine that used to let me do pci passthrough 
was no longer working, until I realized that my hardware is old enough 
to be insecure, and I've upgraded software since my last passthrough 
test, so the CVE fix from 
https://bugzilla.redhat.com/show_bug.cgi?id=715555 has to be 
intentionally bypassed for me to get passthrough again).  With that 
testing, I proved that this patch prevents that problem.  But, as 
written, your patch caused the dreaded "An error occurred, but the cause 
is unknown".

>
> You will see the device will be reattached to host successfully.
> As pciDeviceReattach just check if the device is still used by
> other domain via checking if the device is in list driver->activePciHostdevs,
> however, the device is deleted from the list by step 2).
>
> This patch is to prohibit the bug by:
>    1) Prohibit a domain starting or device attachment right at
>       preparation period (qemuPrepareHostdevPCIDevices) if the
>       device is in list driver->activePciHostdevs, which means
>       it's used by other domain.
>
>    2) Introduces a new field for struct _pciDevice, (const char *used_by),
>       it will be set as the domain name at preparation period,
>       (qemuPrepareHostdevPCIDevices). Thus we can prohibit deleting
>       the device from driver->activePciHostdevs if it's still used by
>       other domain when stopping the domain process.
>

> @@ -111,7 +112,7 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver *driver,
>       if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs)))
>           return -1;
>
> -    /* We have to use 4 loops here. *All* devices must
> +    /* We have to use 6 loops here. *All* devices must
>        * be detached before we reset any of them, because
>        * in some cases you have to reset the whole PCI,
>        * which impacts all devices on it. Also, all devices

I added further loop labels.

> +        /* The device is in use by other active domain if
> +         * the dev is in list driver->activePciHostdevs.
> +         */
> +        if (!pciDeviceIsAssignable(dev, !driver->relaxedACS) ||
> +            pciDeviceListFind(driver->activePciHostdevs, dev))
> +            goto cleanup;
> +    }

Here's where the bad message comes in - you jump to cleanup without ever 
emitting an error message.  Not to mention now that we have the new 
used_by field, the message could actually be informative :)  With my 
changes below, I get the much nicer:

Error starting domain: Requested operation is not valid: PCI device 
0000:0a:0a.0 is in use by domain dom

> @@ -277,6 +298,18 @@ void qemuDomainReAttachHostdevDevices(struct qemud_driver *driver,
>
>       for (i = 0; i<  pciDeviceListCount(pcidevs); i++) {
>           pciDevice *dev = pciDeviceListGet(pcidevs, i);
> +        pciDevice *activeDev = NULL;
> +        const char *used_by = NULL;
> +
> +        /* Never delete the dev from list driver->activePciHostdevs
> +         *  if it's used by other domain.
> +         */
> +        activeDev = pciDeviceListFind(driver->activePciHostdevs, dev);
> +        if (activeDev&&
> +            (used_by = pciDeviceGetUsedBy(activeDev))&&
> +            STRNEQ(name, used_by))
> +            continue;

>
> used_by is kept as it might be NULL.
>

In that case, use STRNEQ_NULLABLE.  But you are right - across libvirtd 
restarts, we lose track of which domain owns the device - so I did 
indeed reproduce a case of NULL.  Perhaps food for a later patch (when 
reloading domains, cycle through all hostdevs in use by active domains 
to repopulate the used_by fields).  But not a show-stopper to getting 
this bug fix in now.

Well, I did find one other problem with a libvirtd restart - as soon as 
used_by is NULL, then stopping the domain that was actually using the 
device no longer frees that device out of the driver->active list (since 
we now only free if used_by matches) - so maybe it's important to fix 
libvirtd reload repopulating used_by sooner rather than later.

> @@ -1312,6 +1313,7 @@ pciGetDevice(unsigned domain,
>       dev->bus      = bus;
>       dev->slot     = slot;
>       dev->function = function;
> +    dev->used_by  = NULL;

Pointless, since VIR_ALLOC 0-initializes.

>
>       if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x",
>                    dev->domain, dev->bus, dev->slot,
> @@ -1374,6 +1376,7 @@ pciFreeDevice(pciDevice *dev)
>       VIR_DEBUG("%s %s: freeing", dev->id, dev->name);
>       pciCloseConfig(dev);
>       VIR_FREE(dev->path);
> +    dev->used_by = NULL;
>       VIR_FREE(dev);

Pointless, since dev is freed in the next statement.

ACK with this squashed in, so I went ahead and pushed.

diff --git i/src/libvirt_private.syms w/src/libvirt_private.syms
index 4673332..1e03e33 100644
--- i/src/libvirt_private.syms
+++ w/src/libvirt_private.syms
@@ -881,6 +881,7 @@ virNWFilterHashTableRemoveEntry;
  pciDettachDevice;
  pciDeviceFileIterate;
  pciDeviceGetManaged;
+pciDeviceGetName;
  pciDeviceGetUsedBy;
  pciDeviceIsAssignable;
  pciDeviceIsVirtualFunction;
diff --git i/src/qemu/qemu_hostdev.c w/src/qemu/qemu_hostdev.c
index f354ea8..c82c512 100644
--- i/src/qemu/qemu_hostdev.c
+++ w/src/qemu/qemu_hostdev.c
@@ -1,7 +1,7 @@
  /*
   * qemu_hostdev.c: QEMU hostdev management
   *
- * Copyright (C) 2006-2007, 2009-2010 Red Hat, Inc.
+ * Copyright (C) 2006-2007, 2009-2011 Red Hat, Inc.
   * Copyright (C) 2006 Daniel P. Berrange
   *
   * This library is free software; you can redistribute it and/or
@@ -119,21 +119,40 @@ int qemuPrepareHostdevPCIDevices(struct 
qemud_driver *driver,
       * must be reset before being marked as active.
       */

-    /* XXX validate that non-managed device isn't in use, eg
+    /* Loop 1: validate that non-managed device isn't in use, eg
       * by checking that device is either un-bound, or bound
       * to pci-stub.ko
       */

      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
          pciDevice *dev = pciDeviceListGet(pcidevs, i);
+        pciDevice *other;
+
+        if (!pciDeviceIsAssignable(dev, !driver->relaxedACS)) {
+            qemuReportError(VIR_ERR_OPERATION_INVALID,
+                            _("PCI device %s is not assignable"),
+                            pciDeviceGetName(dev));
+            goto cleanup;
+        }
          /* The device is in use by other active domain if
           * the dev is in list driver->activePciHostdevs.
           */
-        if (!pciDeviceIsAssignable(dev, !driver->relaxedACS) ||
-            pciDeviceListFind(driver->activePciHostdevs, dev))
+        if ((other = pciDeviceListFind(driver->activePciHostdevs, dev))) {
+            const char *other_name = pciDeviceGetUsedBy(other);
+
+            if (other_name)
+                qemuReportError(VIR_ERR_OPERATION_INVALID,
+                                _("PCI device %s is in use by domain %s"),
+                                pciDeviceGetName(dev), other_name);
+            else
+                qemuReportError(VIR_ERR_OPERATION_INVALID,
+                                _("PCI device %s is already in use"),
+                                pciDeviceGetName(dev));
              goto cleanup;
+        }
      }

+    /* Loop 2: detach managed devices */
      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
          pciDevice *dev = pciDeviceListGet(pcidevs, i);
          if (pciDeviceGetManaged(dev) &&
@@ -141,15 +160,15 @@ int qemuPrepareHostdevPCIDevices(struct 
qemud_driver *driver,
              goto reattachdevs;
      }

-    /* Now that all the PCI hostdevs have be dettached, we can safely
-     * reset them */
+    /* Loop 3: Now that all the PCI hostdevs have been detached, we
+     * can safely reset them */
      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
          pciDevice *dev = pciDeviceListGet(pcidevs, i);
          if (pciResetDevice(dev, driver->activePciHostdevs, pcidevs) < 0)
              goto reattachdevs;
      }

-    /* Now mark all the devices as active */
+    /* Loop 4: Now mark all the devices as active */
      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
          pciDevice *dev = pciDeviceListGet(pcidevs, i);
          if (pciDeviceListAdd(driver->activePciHostdevs, dev) < 0) {
@@ -158,8 +177,8 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver 
*driver,
          }
      }

-    /* Now set the used_by_domain of the device in 
driver->activePciHostdevs
-     * as domain name.
+    /* Loop 5: Now set the used_by_domain of the device in
+     * driver->activePciHostdevs as domain name.
       */
      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
          pciDevice *dev, *activeDev;
@@ -170,7 +189,7 @@ int qemuPrepareHostdevPCIDevices(struct qemud_driver 
*driver,
          pciDeviceSetUsedBy(activeDev, name);
      }

-    /* Now steal all the devices from pcidevs */
+    /* Loop 6: Now steal all the devices from pcidevs */
      while (pciDeviceListCount(pcidevs) > 0) {
          pciDevice *dev = pciDeviceListGet(pcidevs, 0);
          pciDeviceListSteal(pcidevs, dev);
@@ -299,15 +318,13 @@ void qemuDomainReAttachHostdevDevices(struct 
qemud_driver *driver,
      for (i = 0; i < pciDeviceListCount(pcidevs); i++) {
          pciDevice *dev = pciDeviceListGet(pcidevs, i);
          pciDevice *activeDev = NULL;
-        const char *used_by = NULL;

          /* Never delete the dev from list driver->activePciHostdevs
-         *  if it's used by other domain.
+         * if it's used by other domain.
           */
          activeDev = pciDeviceListFind(driver->activePciHostdevs, dev);
          if (activeDev &&
-            (used_by = pciDeviceGetUsedBy(activeDev)) &&
-            STRNEQ(name, used_by))
+            STRNEQ_NULLABLE(name, pciDeviceGetUsedBy(activeDev)))
              continue;

          pciDeviceListDel(driver->activePciHostdevs, dev);
diff --git i/src/util/pci.c w/src/util/pci.c
index 888784a..2bbb90c 100644
--- i/src/util/pci.c
+++ w/src/util/pci.c
@@ -1313,7 +1313,6 @@ pciGetDevice(unsigned domain,
      dev->bus      = bus;
      dev->slot     = slot;
      dev->function = function;
-    dev->used_by  = NULL;

      if (snprintf(dev->name, sizeof(dev->name), "%.4x:%.2x:%.2x.%.1x",
                   dev->domain, dev->bus, dev->slot,
@@ -1376,10 +1375,15 @@ pciFreeDevice(pciDevice *dev)
      VIR_DEBUG("%s %s: freeing", dev->id, dev->name);
      pciCloseConfig(dev);
      VIR_FREE(dev->path);
-    dev->used_by = NULL;
      VIR_FREE(dev);
  }

+const char *
+pciDeviceGetName(pciDevice *dev)
+{
+    return dev->name;
+}
+
  void pciDeviceSetManaged(pciDevice *dev, unsigned managed)
  {
      dev->managed = !!managed;
diff --git i/src/util/pci.h w/src/util/pci.h
index 640c6af..ab29c0b 100644
--- i/src/util/pci.h
+++ w/src/util/pci.h
@@ -1,5 +1,5 @@
  /*
- * Copyright (C) 2009 Red Hat, Inc.
+ * Copyright (C) 2009, 2011 Red Hat, Inc.
   *
   * This library is free software; you can redistribute it and/or
   * modify it under the terms of the GNU Lesser General Public
@@ -39,6 +39,7 @@ pciDevice *pciGetDevice      (unsigned       domain,
                                unsigned       slot,
                                unsigned       function);
  void       pciFreeDevice     (pciDevice     *dev);
+const char *pciDeviceGetName (pciDevice     *dev);
  int        pciDettachDevice  (pciDevice     *dev, pciDeviceList 
*activeDevs);
  int        pciReAttachDevice (pciDevice     *dev, pciDeviceList 
*activeDevs);
  int        pciResetDevice    (pciDevice     *dev,


-- 
Eric Blake   eblake at redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




More information about the libvir-list mailing list