[libvirt] [PATCH 5/5] qemu: Prefer VFIO for PCI device passthrough

Laine Stump laine at laine.org
Mon Sep 23 09:05:05 UTC 2013


On 09/23/2013 02:42 AM, Peter Krempa wrote:
> On 09/20/13 13:56, Daniel P. Berrange wrote:
>> On Fri, Sep 20, 2013 at 11:07:00AM +0200, Peter Krempa wrote:
>>> Prefer using VFIO (if available) to the legacy KVM device passthrough.
>>>
>>> With this patch a PCI passthrough device without the driver configured
>>> will be started with VFIO if it's available on the host. If not legacy
>>> KVM passthrough is checked and error is reported if it's not available.
>>> ---
>>>  docs/formatdomain.html.in                          |  9 ++++----
>>>  src/conf/domain_conf.h                             |  2 +-
>>>  src/qemu/qemu_command.c                            |  2 +-
>>>  src/qemu/qemu_hotplug.c                            | 26 +++++++++++++++++++++-
>>>  src/qemu/qemu_process.c                            | 18 ++++++++++++++-
>>>  .../qemuxml2argv-hostdev-pci-address-device.args   |  2 +-
>>>  .../qemuxml2argvdata/qemuxml2argv-net-hostdev.args |  2 +-
>>>  tests/qemuxml2argvdata/qemuxml2argv-pci-rom.args   |  4 ++--
>>>  8 files changed, 52 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
>>> index 3689399..6f3f7cf 100644
>>> --- a/docs/formatdomain.html.in
>>> +++ b/docs/formatdomain.html.in
>>> @@ -2755,11 +2755,10 @@
>>>          backend, which is compatible with UEFI SecureBoot) or "kvm"
>>>          (for the legacy device assignment handled directly by the KVM
>>>          kernel module)<span class="since">Since 1.0.5 (QEMU and KVM
>>> -        only, requires kernel 3.6 or newer)</span>. Currently, "kvm"
>>> -        is the default used by libvirt when not explicitly provided,
>>> -        but since the two are functionally equivalent, this default
>>> -        could be changed in the future with no impact to domains that
>>> -        don't specify anything.
>>> +        only, requires kernel 3.6 or newer)</span>. The default, when
>>> +        the driver name is not explicitly specified, is to check wether
>>> +        VFIO is available and use it if it's the case. If VFIO is not
>>> +        available, the legacy "kvm" assignment is attempted.
>>>        </dd>
>>>        <dt><code>readonly</code></dt>
>>>        <dd>Indicates that the device is readonly, only supported by SCSI host
>>> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
>>> index 9414ebf..43d8746 100644
>>> --- a/src/conf/domain_conf.h
>>> +++ b/src/conf/domain_conf.h
>>> @@ -399,7 +399,7 @@ enum virDomainHostdevSubsysType {
>>>
>>>  /* the backend driver used for PCI hostdev devices */
>>>  typedef enum {
>>> -    VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT, /* currently kvm, could change */
>>> +    VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT, /* detect automaticaly, prefer VFIO */
>>>      VIR_DOMAIN_HOSTDEV_PCI_BACKEND_KVM,    /* force legacy kvm style */
>>>      VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO,   /* force vfio */
>>>
>>> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
>>> index 1411533..0d7c02e 100644
>>> --- a/src/qemu/qemu_command.c
>>> +++ b/src/qemu/qemu_command.c
>>> @@ -5395,13 +5395,13 @@ qemuBuildPCIHostdevDevStr(virDomainDefPtr def,
>>>
>>>      switch ((virDomainHostdevSubsysPciBackendType)
>>>              dev->source.subsys.u.pci.backend) {
>>> -    case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT:
>>>      case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_KVM:
>>>          virBufferAddLit(&buf, "pci-assign");
>>>          if (configfd && *configfd)
>>>              virBufferAsprintf(&buf, ",configfd=%s", configfd);
>>>          break;
>>>
>>> +    case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_DEFAULT:
>>>      case VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO:
>>>          virBufferAddLit(&buf, "vfio-pci");
>>>          break;
>> The code calling this function should have translated 'DEFAULT' into
>> either 'KVM' or 'VFIO', so if we see 'DEFAULT' here shouldn't we
>> raise an error.

I thought what had been discussed in IRC was to modify the qemu
capabilities in the caller according to stricter checks for properly
operating vfio/legacy passthrough. Then at a lower level, we would still
always prefer vfio when available, but that behavior could be changed by
the higher level function (e.g. qemuProcessStart()) modifying the qemu
capabilities to clear the vfio capability bit.

So the value in the config object wouldn't be changed, just the flags in
the capabilities. But I guess I must have misunderstood because,
thinking about it, the capabilities object should be immutable (right?).

So were you suggesting that we change the config value instead? But
normally we don't want to change the config object either though, since
that affects what gets saved when we save or migrate a domain. (Of
course, you can't save or migrate a domain with a passed-through device
anyway, so maybe that's not a problem :-) Am I just pointlessly arguing
with myself here?

At any rate, I don't see either of those things happening in this patch
series - as far as I can see, the result of the series is that if
someone specifies no driver in the config and vfio isn't available, then
an error will be thrown even if legacy kvm device assignment *is*
available. Instead, we should be instructing the commandline generator
to use legacy assigment "pci-assign" instead of vfio in this case.

> If we change this to produce error in case DEFAULT is passed, then all
> the tests that are checking correctness of the generated commandline
> with passthrough devices will fail or need to be hardcoded with driver
> names as the default setting function can't be called in the testsuite.
>
> Do you think it's better to change the tests so that "DEFAULT" is
> avoided and error out if default is passed to the command line generator?


In the long run, the user shouldn't care at all which driver is used for
passthrough, as they both produce the same results. "default" (i.e.
complete absence of <driver name='xx'/>) should remain a valid option in
the config, and that when that is the case it should mean "prefer vfio
if available, fallback to legacy if necessary, or fail if neither is
available.


>>> --- a/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev.args
>>> +++ b/tests/qemuxml2argvdata/qemuxml2argv-net-hostdev.args
>>> @@ -3,5 +3,5 @@ LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test QEMU_AUDIO_DRV=none \
>>>  -M pc -m 214 -smp 1 -nographic -nodefconfig -nodefaults -monitor \
>>>  unix:/tmp/test-monitor,server,nowait -no-acpi -boot c \
>>>  -usb -hda /dev/HostVG/QEMUGuest1 \
>>> --device pci-assign,host=03:07.1,id=hostdev0,bus=pci.0,addr=0x3 \
>>> +-device vfio-pci,host=03:07.1,id=hostdev0,bus=pci.0,addr=0x3 \
>>>  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
>>> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-pci-rom.args b/tests/qemuxml2argvdata/qemuxml2argv-pci-rom.args
>>> index c850613..615047a 100644
>>> --- a/tests/qemuxml2argvdata/qemuxml2argv-pci-rom.args
>>> +++ b/tests/qemuxml2argvdata/qemuxml2argv-pci-rom.args
>>> @@ -7,7 +7,7 @@ unix:/tmp/test-monitor,server,nowait -no-acpi -boot c -usb -hda \
>>>  -net user,vlan=0,name=hostnet0 \
>>>  -device virtio-net-pci,vlan=1,id=net1,mac=52:54:00:24:a5:9e,bus=pci.0,addr=0x4,\
>>>  romfile=/etc/fake/bootrom.bin -net user,vlan=1,name=hostnet1 \
>>> --device pci-assign,host=06:12.5,id=hostdev0,bus=pci.0,addr=0x5,rombar=0 \
>>> --device pci-assign,host=06:12.6,id=hostdev1,bus=pci.0,addr=0x6,rombar=1,\
>>> +-device vfio-pci,host=06:12.5,id=hostdev0,bus=pci.0,addr=0x5,rombar=0 \
>>> +-device vfio-pci,host=06:12.6,id=hostdev1,bus=pci.0,addr=0x6,rombar=1,\
>>>  romfile=/etc/fake/bootrom.bin \
>>>  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7
>> Why is the XML here changing ? Shouldn't the test suite be stable in
>> what it generates regardless of what the host happens to support ?
> The difference comes from the change of the default to VFIO in the
> commandline generator function.

Right. Although this does indicate a change in the commandline that is
generated for identical XML, that change is okay; it just means that for
the case where both vfio and legacy assignment are available (in the
cases of tests, purely based on the capability bits, which are provided
as an arg to the toplevel test macro), we've decided to now prefer vfio
over legacy.




More information about the libvir-list mailing list