[vfio-users] Passthrough for non-DMA-masters on x86

Tue Apr 21 17:49:21 UTC 2020

On Mon, Apr 20, 2020 at 2:08 PM Alex Williamson
<alex.williamson at redhat.com> wrote:
>
> On Fri, 17 Apr 2020 09:34:49 -0700
> Micah Morton <mortonm at chromium.org> wrote:
>
> > Hi Alex,
> >
> > I've been looking at device passthrough for platform devices on x86
> > that are not behind an IOMMU by virtue of not being DMA masters. I
> > think on some level this is an explicit non-goal of VFIO
> > (https://www.spinics.net/lists/linux-renesas-soc/msg26153.html ,
> > https://blog.linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_IOMMU.txt)?
>
> Mostly that's correct.  We do have a no-iommu mode, which was added to
> avoid introducing MSI/X support to uio_pci_generic.  No-iommu mode
> implements the device interface, including interrupts, but the user is
> on their own for any other kind of DMA.  It also taints the kernel
> since we're giving a user access to a device without protection of an
> IOMMU.

Ah ok that makes sense. I was looking at uio but was concerned the
interrupt forwarding logic might not be as complete as vfio -- as a
quick search for "irqfd" or "eventfd" in the uio code base didn't turn
up anything. I haven't dug deep enough to understand how interrupt
routing in uio works.

Seems like "no-iommu" mode is safe as long as you can guarantee that
all DMA masters on your system are behind an IOMMU. This seems to be
the case on Intel/AMD SoCs that I've looked at, and even ARM SoCs are
moving in this direction. Maybe there is a future where "no-iommu"
mode is safe if there is some way to indicate to vfio that all DMA
masters are behind an IOMMU?

>
> > >From my understanding VFIO is mostly about IOMMU management. I have a
> > few questions however:
> >
> > 1) Are interrupt forwarding, IOMMU mgmt, and PCI config space
> > virtualization the main 3 things that VFIO does (plus some hacks to
> > get GPUs working in guests)? Would you add any other aspects of VFIO
> > that I'm missing?
>
> The entire device is accessed through vfio, including all memory and
> I/O ranges.  There are also interfaces for device resets.
>
> > 2) If you can forward interrupts to a guest without VFIO (say with
> > something like this patch:
> > https://www.spinics.net/lists/kvm/msg207949.html), then it should be
> > pretty simple to configure the VMM to make the MMIO regions of the
> > platform device available to the guest. Is VFIO in the loop at all for
> > actually giving the guest access to the MMIOs or is that just done by
> > mappings in the VMM?
>
> Yes, vfio is in the loop.  A file descriptor is used to access the
> device.  Each memory or I/O region of the device is mapped through the
> VMM via offsets on that fd.

Thanks for the explanation, I was overlooking this part of the design.

>
> > *I don't think I care about VFIO virtualizing PCI BARs for the guest
> > since I would be telling the guest about the platform devices through
> > ACPI -- so the guest wouldn't be looking to the PCI config space for
> > that info anyway. I guess one thing to worry about here would be any
> > dependencies the assigned platform device has on any other platform
> > devices in the system that don't get assigned to the guest.
>
> You're aware of vfio-platform, right?  Is vfio-platform with
> enable_unsafe_noiommu_mode=1 on the vfio module what you're trying to
> do?  Of course if you have a non-DMA device, you could also create a
> host driver that wraps it via mdev.  You could even make the device
> expose a vfio-pci rather than vfio-platform API and invent a fake
> config space for it so you don't need to mess with ACPI (assuming
> there's a driver in the guest that could bind to a PCI version of the
> device).

AFAICT vfio-platform is strictly for platform devices on ARM SoCs that
are DMA capable and behind an IOMMU? So far I've been talking about
non-DMA-capable devices on x86 (but I am just as interested on the ARM
side). Is using "no-iommu" mode on x86 designed for platform or PCI
devices or both? I'll have to look into mdev, I'm not familiar with
that.

>
> > 3) Are PCI devices always DMA masters, or at least are they always put
> > in an IOMMU group? Have you seen cases of PCI devices that were not
> > assignable to a guest through vfio-pci because they weren't in an
> > IOMMU group and/or weren't DMA masters?
>
> Non-DMA master PCI devices is not a set that has any special handling.
> AFAIK, there's really no way to define a PCI device as non-DMA.
> Perhaps the bus-master bit could be hard-coded to zero, but I think
> that would be ad-hoc, not really defined by the spec.  Whether a PCI
> device is placed into an IOMMU group depends on the topology, if it's
> downstream of an IOMMU, then it's placed into an IOMMU group,
> regardless of DMA capabilities.  A system could be constructed where
> only a subset of devices are downstream of an IOMMU, but I've never
> seen such a configuration.  Thanks,

Sounds good, thanks!

>
> Alex
>