[vfio-users] Passthrough for non-DMA-masters on x86

Mon Apr 20 21:08:19 UTC 2020

On Fri, 17 Apr 2020 09:34:49 -0700
Micah Morton <mortonm at chromium.org> wrote:

> Hi Alex,
> 
> I've been looking at device passthrough for platform devices on x86
> that are not behind an IOMMU by virtue of not being DMA masters. I
> think on some level this is an explicit non-goal of VFIO
> (https://www.spinics.net/lists/linux-renesas-soc/msg26153.html ,
> https://blog.linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_IOMMU.txt)?

Mostly that's correct.  We do have a no-iommu mode, which was added to
avoid introducing MSI/X support to uio_pci_generic.  No-iommu mode
implements the device interface, including interrupts, but the user is
on their own for any other kind of DMA.  It also taints the kernel
since we're giving a user access to a device without protection of an
IOMMU.

> >From my understanding VFIO is mostly about IOMMU management. I have a  
> few questions however:
> 
> 1) Are interrupt forwarding, IOMMU mgmt, and PCI config space
> virtualization the main 3 things that VFIO does (plus some hacks to
> get GPUs working in guests)? Would you add any other aspects of VFIO
> that I'm missing?

The entire device is accessed through vfio, including all memory and
I/O ranges.  There are also interfaces for device resets.

> 2) If you can forward interrupts to a guest without VFIO (say with
> something like this patch:
> https://www.spinics.net/lists/kvm/msg207949.html), then it should be
> pretty simple to configure the VMM to make the MMIO regions of the
> platform device available to the guest. Is VFIO in the loop at all for
> actually giving the guest access to the MMIOs or is that just done by
> mappings in the VMM?

Yes, vfio is in the loop.  A file descriptor is used to access the
device.  Each memory or I/O region of the device is mapped through the
VMM via offsets on that fd.

> *I don't think I care about VFIO virtualizing PCI BARs for the guest
> since I would be telling the guest about the platform devices through
> ACPI -- so the guest wouldn't be looking to the PCI config space for
> that info anyway. I guess one thing to worry about here would be any
> dependencies the assigned platform device has on any other platform
> devices in the system that don't get assigned to the guest.

You're aware of vfio-platform, right?  Is vfio-platform with
enable_unsafe_noiommu_mode=1 on the vfio module what you're trying to
do?  Of course if you have a non-DMA device, you could also create a
host driver that wraps it via mdev.  You could even make the device
expose a vfio-pci rather than vfio-platform API and invent a fake
config space for it so you don't need to mess with ACPI (assuming
there's a driver in the guest that could bind to a PCI version of the
device).

> 3) Are PCI devices always DMA masters, or at least are they always put
> in an IOMMU group? Have you seen cases of PCI devices that were not
> assignable to a guest through vfio-pci because they weren't in an
> IOMMU group and/or weren't DMA masters?

Non-DMA master PCI devices is not a set that has any special handling.
AFAIK, there's really no way to define a PCI device as non-DMA.
Perhaps the bus-master bit could be hard-coded to zero, but I think
that would be ad-hoc, not really defined by the spec.  Whether a PCI
device is placed into an IOMMU group depends on the topology, if it's
downstream of an IOMMU, then it's placed into an IOMMU group,
regardless of DMA capabilities.  A system could be constructed where
only a subset of devices are downstream of an IOMMU, but I've never
seen such a configuration.  Thanks,

Alex