[vfio-users] Avoiding VFIO NOIOMMU taint in safe situations

Thu Apr 4 18:38:13 UTC 2019

On 4/4/19 11:24 AM, Alex Williamson wrote:
> On Wed, 3 Apr 2019 23:31:22 -0500
> Shawn Anastasio <shawn at anastas.io> wrote:
> 
>> On 4/3/19 10:23 PM, Alex Williamson wrote:
>>> On Wed, 3 Apr 2019 22:01:14 -0500
>>> Shawn Anastasio <shawn at anastas.io> wrote:
>>>    
>>>> Hello all,
>>>>
>>>> I'm currently writing an application that makes use of Qemu's ivshmem
>>>> shared memory mechanism, which exposes shared memory regions from the
>>>> host via PCI-E BARs. MSI-X interrupts that are tied to host eventfds are
>>>> also exposed.
>>>>
>>>> Since ivshmem doesn't have an in-tree kernel driver, I have been using
>>>> VFIO's NOIOMMU mode to interface with the device. This works wonderfully
>>>> for both BAR mapping and MSI-X interrupts. Unfortunately though, binding
>>>> the ivshmem device to vfio_pci to use it in this way results in a kernel
>>>> taint. I understand that this is because without an IOMMU, VFIO/Linux
>>>> has no way of preventing devices from performing malicious access to
>>>> other system memory. In the case of ivshmem though, the device does not
>>>> have any DMA capabilities.
>>>
>>> The MSI-X interrupt is a DMA.
>> I hadn't realized this. That means then without an IOMMU, an
>> MSI-X capable device is capable of reading/writing arbitrary
>> memory?
> 
> Writing at least, this is why even with an IOMMU there's an opt-in if
> that IOMMU lacks interrupt remapping support.

Understood. That makes sense.

>>>> This has created a situation in which the
>>>> safest possible way to access the device (a kernel driver would be
>>>> inherently less safe, UIO can't access the MSI-X functionality of the
>>>> device) results in a kernel taint, when other, less safe methods don't.
>>>
>>> MSI-X support in UIO was rejected because MSI-X is a DMA and UIO does
>>> not support devices that do DMA.  Vfio-noiommu was a compromise to
>>> allow using the vfio API, but recognizing that it's inherently unsafe.
>>>    
>>>> In light of this, I propose a change to the VFIO framework that would
>>>> allow use cases such as this without a kernel taint. One solution I see
>>>> is only tainting when PCI devices with DMA capabilities are bound to
>>>> VFIO. It is my understanding that a device's DMA capability can be
>>>> determined by checking the Bus Mastering flag in the device's PCI
>>>> configuration space, so something like this should be feasible.
>>>
>>> The bus master bit is not a capability for probing, enabling bus master
>>> allows a device to perform DMA, including signaling via MSI
>>> interrupts.  No bus master, no MSI.
>>>    
>>>> Perhaps an additional NOIOMMU mode could be introduced which only allows
>>>> devices which meet this criteria, too (VFIO_NOIOMMU_NODMA_IOMMU?).
>>>> Along with a separate Kconfig option, this would allow users to enable
>>>> this safe usage at kernel build time, while still preventing the
>>>> possibility of an unsafe DMA capable device from being used.
>>>>
>>>> I'm curious to hear feedback on this. If this is something that can be
>>>> merged, I'd be more than happy to write a patch.
>>>
>>> Add a vIOMMU to your VM configuration (ie. intel-iommu) and use proper
>>> vfio in the guest.  Thanks,
>> I had looked into this, but my application also targets ppc64, and a
>> cross-platform is therefore necessary.
>>
>> Strangely enough when booting a VM on ppc64, the kernel /does/ report
>> an IOMMU, but there's only 1 group that contains all devices, so it
>> doesn't seem usable.
> 
> Yes, AIUI ppc64 PAPR machines always have an IOMMU and there is a
> SPAPR IOMMU model in vfio.  Maybe work with QEMU ppc64 developers to
> figure out how the ivshmem device can be in its own group.  This
> probably requires configuring the VM with another PCI host bridge and
> attaching the ivshmem device under it.

Interesting. I'll contact the QEMU developers about this.
Thanks for the pointer.

>> I guess it all boils down to this - does this usage of VFIO-NOIOMMU
>> with an MSI-X device constitute a security risk? If so, it seems
>> I'll have no choice but to write a kernel driver for a cross-platform
>> solution.
> 
> There is no property we can detect about a PCI device to determine that
> it doesn't support DMA.  All PCI device have DMA available to them.
> Clearly we can't simply enforce that bus master is never enabled
> because that breaks your use case of needing MSI interrupts and
> presumes devices actually honor that bit and don't have more nefarious
> ways of enabling it.  So if we have no way to know the device
> capabilities or the intention of the user, or exploitability of the
> user, I don't see how we can create a policy that singles out this use
> case as trusted.  Thanks,

I see. Thank you for the explanation.

Thanks,
Shawn