[PATCH v2] util: basic support for VFIO variant drivers

Wed May 31 14:18:17 UTC 2023

Hey Laine,

On 23/08/2022 15:11, Laine Stump wrote:
> ping.
> 
> I have a different version of this patch where I do read the modules.alias file
> rather than just checking the name of the driver, but that also requires "double
> mocking" open() in the unit test, which wasn't working properly, and I'd rather
> not spend the time figuring it out if it's not going to be needed. (Alex prefers
> that version because it is more correct than just checking the name, and he's
> concerned that the new sysfs-based API may take longer than we're thinking to
> get into downstream distros, but the version in this patch does satisfy both
> Jason and Daniel's suggested implementations). Anyway, I can post the other
> patch if anyone is interested.
> 
[sorry for the thread necromancy]

I was wondering if you're planning on respinning this work, or rather the
modalias approach alternative you mention? Or perhaps we are waiting for the
cdev sysfs? Though, there's still a significant amount of kernel versions to
cover that won't have the sysfs entry sadly :(

> On 8/11/22 1:00 AM, Laine Stump wrote:
>> Before a PCI device can be assigned to a guest with VFIO, that device
>> must be bound to the vfio-pci driver rather than to the device's
>> normal driver. The vfio-pci driver provides APIs that permit QEMU to
>> perform all the necessary operations to make the device accessible to
>> the guest.
>>
>> There has been kernel work recently to support vendor/device-specific
>> VFIO variant drivers that provide the basic vfio-pci driver functionality
>> while adding support for device-specific operations (for example these
>> device-specific drivers are planned to support live migration of
>> certain devices). All that will be needed to make this functionality
>> available will be to bind the new vendor-specific driver to the device
>> (rather than the generic vfio-pci driver, which will continue to work
>> just without the extra functionality).
>>
>> But until now libvirt has required that all PCI devices being assigned
>> to a guest with VFIO specifically have the "vfio-pci" driver bound to
>> the device. So even if the user manually binds a shiny new
>> vendor-specific vfio variant driver to the device (and puts
>> "managed='no'" in the config to prevent libvirt from changing the
>> binding), libvirt will just fail during startup of the guest (or
>> during hotplug) because the driver bound to the device isn't exactly
>> "vfio-pci".
>>
>> This patch loosens that restriction a bit - rather than requiring that
>> the device be bound to "vfio-pci", it also checks if the drivername
>> contains the string "vfio" at all, and in this case allows the
>> operation to continue. If the driver is in fact a VFIO variant, then
>> the assignment will succeed, but if it is not a VFIO variant then QEMU
>> will fail (and report the error back to libvirt).
>>
>> In the near future (possibly by kernel 6.0) there will be a
>> formal method of identifying a VFIO variant driver by looking in
>> sysfs; in the meantime the inexact, but simple, method in this patch
>> will allow users of the few existing VFIO variant drivers (and
>> developers of new VFIO variant drivers) to use their new drivers
>> without needing to remove libvirt from their setup - they can simply
>> pre-bind the device to the new driver, then use "managed='no'" in
>> their libvirt config.
>>
>> NB: this patch does *not* handle automatically determining the proper
>> vendor-specific driver and binding to it in the case of
>> "managed='yes'". This will be implemented later when there is a widely
>> available driver / device combo we can use for testing.
>>
>> Signed-off-by: Laine Stump <laine at redhat.com>
>> ---
>> V1 here: https://listman.redhat.com/archives/libvir-list/2022-August/233327.html
>>
>> Change in V2:
>>
>> V1 used the output of modinfo to look for "vfio_pci" as an alias for a
>> driver to see if it was a VFIO variant driver.
>>
>> As a result of discussion of V1, V2 is much simpler - it just assumes
>> that any driver with "vfio" in the name is a VFIO variant. This is
>> okay because 1) QEMU will still catch it and libvirt will properly log
>> the error if the driver isn't actually a VFIO variant, and 2) it's a
>> temporary situation, just to enable use of VFIO variant drivers with
>> libvirt until a standard method of detecting this is added to sysfs
>> (which, according to the discussion of V1, is coming in the near
>> future).
>>
>> (NB: I did implement checking of /lib/modules/`uname -r`/modules.alias
>> as suggested by Erik, but it turned out that this caused the unit
>> tests to call uname(3) and open the modules.alias file on the test
>> host - for a proper unit test I would have also needed to mock these
>> two functions, and it seemed like too much complexity for a temporary
>> workaround. I've implemented Jason's suggestion here (accept any
>> driver with "vfio" in the name), which is similar to danpb's
>> suggestion (accept specifically the two drivers that are already in
>> the upstream kernel), but will also allow for new drivers that may be
>> under development.)
>>
>>   src/hypervisor/virhostdev.c | 26 ++++---------
>>   src/util/virpci.c           | 76 ++++++++++++++++++++++++++++++++++---
>>   src/util/virpci.h           |  3 ++
>>   3 files changed, 82 insertions(+), 23 deletions(-)
>>
>> diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
>> index c0ce867596..15b35fa75e 100644
>> --- a/src/hypervisor/virhostdev.c
>> +++ b/src/hypervisor/virhostdev.c
>> @@ -747,9 +747,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr,
>>                                      mgr->inactivePCIHostdevs) < 0)
>>                   goto reattachdevs;
>>           } else {
>> -            g_autofree char *driverPath = NULL;
>> -            g_autofree char *driverName = NULL;
>> -            int stub;
>> +            g_autofree char *drvName = NULL;
>> +            virPCIStubDriver drvType;
>>                 /* Unmanaged devices should already have been marked as
>>                * inactive: if that's the case, we can simply move on */
>> @@ -769,18 +768,14 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr,
>>                *       information about active / inactive device across
>>                *       daemon restarts has been implemented */
>>   -            if (virPCIDeviceGetDriverPathAndName(pci,
>> -                                                 &driverPath, &driverName) < 0)
>> +            if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) < 0)
>>                   goto reattachdevs;
>>   -            stub = virPCIStubDriverTypeFromString(driverName);
>> -
>> -            if (stub > VIR_PCI_STUB_DRIVER_NONE &&
>> -                stub < VIR_PCI_STUB_DRIVER_LAST) {
>> +            if (drvType > VIR_PCI_STUB_DRIVER_NONE) {
>>                     /* The device is bound to a known stub driver: store this
>>                    * information and add a copy to the inactive list */
>> -                virPCIDeviceSetStubDriver(pci, stub);
>> +                virPCIDeviceSetStubDriver(pci, drvType);
>>                     VIR_DEBUG("Adding PCI device %s to inactive list",
>>                             virPCIDeviceGetName(pci));
>> @@ -2292,18 +2287,13 @@ virHostdevPrepareOneNVMeDevice(virHostdevManager
>> *hostdev_mgr,
>>       /* Let's check if all PCI devices are NVMe disks. */
>>       for (i = 0; i < virPCIDeviceListCount(pciDevices); i++) {
>>           virPCIDevice *pci = virPCIDeviceListGet(pciDevices, i);
>> -        g_autofree char *drvPath = NULL;
>>           g_autofree char *drvName = NULL;
>> -        int stub = VIR_PCI_STUB_DRIVER_NONE;
>> +        virPCIStubDriver drvType;
>>   -        if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0)
>> +        if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) < 0)
>>               goto cleanup;
>>   -        if (drvName)
>> -            stub = virPCIStubDriverTypeFromString(drvName);
>> -
>> -        if (stub == VIR_PCI_STUB_DRIVER_VFIO ||
>> -            STREQ_NULLABLE(drvName, "nvme"))
>> +        if (drvType == VIR_PCI_STUB_DRIVER_VFIO || STREQ_NULLABLE(drvName,
>> "nvme"))
>>               continue;
>>             VIR_WARN("Suspicious NVMe disk assignment. PCI device "
>> diff --git a/src/util/virpci.c b/src/util/virpci.c
>> index 7800966963..51ccf4d9fd 100644
>> --- a/src/util/virpci.c
>> +++ b/src/util/virpci.c
>> @@ -277,6 +277,71 @@ virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, char
>> **path, char **name)
>>   }
>>     +/**
>> + * virPCIDeviceGetDriverNameAndType:
>> + * @dev: virPCIDevice object to examine
>> + * @drvName: returns name of driver bound to this device (if any)
>> + * @drvType: returns type of driver if it is a known stub driver type
>> + *
>> + * Find the name of the driver bound to @dev (if any) and the type of
>> + * the driver if it is a known/recognized "stub" driver (based on the
>> + * driver name).
>> + *
>> + * There are vfio "variant" drivers that provide all the basic
>> + * functionality of the standard vfio-pci driver as well as additional
>> + * stuff. There is a plan to add info to sysfs that will allow easily
>> + * determining if a driver is a vfio variant driver, but that sysfs
>> + * entry isn't yet available. In the meantime as a workaround so that
>> + * the few existing vfio variant drivers can be used with libvirt, and
>> + * so that driver developers can test their new vfio variant drivers
>> + * without needing to bypass libvirt, we also check if the driver name
>> + * contains the string "vfio"; if it does, then we consider this drier
>> + * as type VFIO. This can lead to false positives, but that isn't a
>> + * horrible thing, because the problem will still be caught by QEMU as
>> + * soon as libvirt makes the request to attach the device.
>> + *
>> + * Return 0 on success, -1 on failure. If -1 is returned, then an error
>> + * message has been logged.
>> + */
>> +int
>> +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
>> +                                 char **drvName,
>> +                                 virPCIStubDriver *drvType)
>> +{
>> +    g_autofree char *drvPath = NULL;
>> +    int tmpType;
>> +
>> +    if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0)
>> +        return -1;
>> +
>> +    if (!*drvName) {
>> +        *drvType = VIR_PCI_STUB_DRIVER_NONE;
>> +        return 0;
>> +    }
>> +
>> +    tmpType = virPCIStubDriverTypeFromString(*drvName);
>> +
>> +    if (tmpType > VIR_PCI_STUB_DRIVER_NONE) {
>> +        *drvType = tmpType;
>> +        return 0; /* exact match of a known driver name (or no name) */
>> +    }
>> +
>> +    /* Check if the drivername contains "vfio" and count as a VFIO
>> +     * driver if so - see above for explanation.
>> +     */
>> +
>> +    if (strstr(*drvName, "vfio")) {
>> +        VIR_DEBUG("Driver %s is a vfio_pci driver", *drvName);
>> +        *drvType = VIR_PCI_STUB_DRIVER_VFIO;
>> +    } else {
>> +        VIR_DEBUG("Driver %s is NOT a vfio_pci driver", *drvName);
>> +        *drvType = VIR_PCI_STUB_DRIVER_NONE;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +
>>   static int
>>   virPCIDeviceConfigOpenInternal(virPCIDevice *dev, bool readonly, bool fatal)
>>   {
>> @@ -1004,8 +1069,8 @@ virPCIDeviceReset(virPCIDevice *dev,
>>                     virPCIDeviceList *activeDevs,
>>                     virPCIDeviceList *inactiveDevs)
>>   {
>> -    g_autofree char *drvPath = NULL;
>>       g_autofree char *drvName = NULL;
>> +    virPCIStubDriver drvType;
>>       int ret = -1;
>>       int fd = -1;
>>       int hdrType = -1;
>> @@ -1032,15 +1097,16 @@ virPCIDeviceReset(virPCIDevice *dev,
>>        * reset it whenever appropriate, so doing it ourselves would just
>>        * be redundant.
>>        */
>> -    if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, &drvName) < 0)
>> +    if (virPCIDeviceGetDriverNameAndType(dev, &drvName, &drvType) < 0)
>>           goto cleanup;
>>   -    if (virPCIStubDriverTypeFromString(drvName) == VIR_PCI_STUB_DRIVER_VFIO) {
>> -        VIR_DEBUG("Device %s is bound to vfio-pci - skip reset",
>> -                  dev->name);
>> +    if (drvType == VIR_PCI_STUB_DRIVER_VFIO) {
>> +
>> +        VIR_DEBUG("Device %s is bound to %s - skip reset", dev->name, drvName);
>>           ret = 0;
>>           goto cleanup;
>>       }
>> +
>>       VIR_DEBUG("Resetting device %s", dev->name);
>>         if ((fd = virPCIDeviceConfigOpenWrite(dev)) < 0)
>> diff --git a/src/util/virpci.h b/src/util/virpci.h
>> index 4d9193f24e..0532b90f90 100644
>> --- a/src/util/virpci.h
>> +++ b/src/util/virpci.h
>> @@ -280,6 +280,9 @@ int virPCIDeviceRebind(virPCIDevice *dev);
>>   int virPCIDeviceGetDriverPathAndName(virPCIDevice *dev,
>>                                        char **path,
>>                                        char **name);
>> +int virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
>> +                                     char **drvName,
>> +                                     virPCIStubDriver *drvType);
>>     int virPCIDeviceIsPCIExpress(virPCIDevice *dev);
>>   int virPCIDeviceHasPCIExpressLink(virPCIDevice *dev);
>