[vfio-users] P2P DMA between endpoint devices inside a VM

Wed Sep 23 22:43:56 UTC 2020

On Wed, 23 Sep 2020 15:32:15 -0700
Maran Wilson <maran.wilson at gmail.com> wrote:

> On Wed, Sep 23, 2020 at 2:19 PM Alex Williamson <alex.williamson at redhat.com>
> wrote:
> 
> > On Wed, 23 Sep 2020 13:08:10 -0700
> > Maran Wilson <maran.wilson at gmail.com> wrote:
> >  
> > > Just wanted to wrap up this thread by confirming what Alex said is true  
> > (in  
> > > case anyone else is interested in this topic in the future). After  
> > enabling  
> > > IOMMU tracing on the host I was able to confirm that IOMMU mappings were,
> > > in fact, being created properly to map the gPA to hPA of both devices'  
> > BAR  
> > > resources.
> > >
> > > It turns out that our hardware device provides a backdoor way of reading
> > > PCI config space via BAR mapped register space.  The driver inside the VM
> > > was using that and thereby reading back the hPA of the BAR (and using  
> > that  
> > > to program the DMA controller). This sort of breaks the whole  
> > pass-through  
> > > model so I'll have to sort that out on the driver/device side to close  
> > that  
> > > loophole somehow so that the driver inside the VM is forced to use  
> > standard  
> > > Linux APIs to read PCI config space. That way KVM/Qemu can properly
> > > intercept the access and return the gPA values.  
> >
> > Thanks for the follow-up!  It sounds like another option for you might
> > be to virtualize those backdoor accesses like we do for GPUs that have
> > similar features.  That could allow existing drivers to work
> > unmodified.  If you're interested, take a look at hw/vfio/pci-quirks.c
> > in QEMU.  We have generic support for both a VFIOConfigWindowQuirk,
> > where access to config space is through separate data and offset
> > registers, and a VFIOConfigMirrorQuirk, where a range of MMIO space
> > maps to config space.  We just need to know the parameters to apply to
> > your device.  The only downside to the virtualization is that we trap
> > MMIO accesses at page size granularity, so MMIO accesses within that
> > shared page would fault into QEMU to do a read or write rather than
> > make use of the direct access provided through an mmap.  Thanks,
> >
> > Alex
> >  
> 
> Oh yeah. That's exactly what we have going on with our hardware.  So if I'm
> understanding properly, we would just have to run with a patched version of
> Qemu on the host and the rest of the SW stack (kernel, vfio-pci driver,
> etc) can run as-is right?

Yup, QEMU virtualizes the backdoor to expose the emulated config space
rather than the bare metal config space, everything else just work. 
Thanks,

Alex