[PATCH RFC v2 00/13] IOMMUFD Generic interface

Steven Sistare steven.sistare at oracle.com
Tue Oct 11 20:30:58 UTC 2022


On 10/11/2022 8:30 AM, Jason Gunthorpe wrote:
> On Mon, Oct 10, 2022 at 04:54:50PM -0400, Steven Sistare wrote:
>>> Do we have a solution to this?
>>>
>>> If not I would like to make a patch removing VFIO_DMA_UNMAP_FLAG_VADDR
>>>
>>> Aside from the approach to use the FD, another idea is to just use
>>> fork.
>>>
>>> qemu would do something like
>>>
>>>  .. stop all container ioctl activity ..
>>>  fork()
>>>     ioctl(CHANGE_MM) // switch all maps to this mm
>>>     .. signal parent.. 
>>>     .. wait parent..
>>>     exit(0)
>>>  .. wait child ..
>>>  exec()
>>>  ioctl(CHANGE_MM) // switch all maps to this mm
>>>  ..signal child..
>>>  waitpid(childpid)
>>>
>>> This way the kernel is never left without a page provider for the
>>> maps, the dummy mm_struct belonging to the fork will serve that role
>>> for the gap.
>>>
>>> And the above is only required if we have mdevs, so we could imagine
>>> userspace optimizing it away for, eg vfio-pci only cases.
>>>
>>> It is not as efficient as using a FD backing, but this is super easy
>>> to implement in the kernel.
>>
>> I propose to avoid deadlock for mediated devices as follows.  Currently, an
>> mdev calling vfio_pin_pages blocks in vfio_wait while VFIO_DMA_UNMAP_FLAG_VADDR
>> is asserted.
>>
>>   * In vfio_wait, I will maintain a list of waiters, each list element
>>     consisting of (task, mdev, close_flag=false).
>>
>>   * When the vfio device descriptor is closed, vfio_device_fops_release
>>     will notify the vfio_iommu driver, which will find the mdev on the waiters
>>     list, set elem->close_flag=true, and call wake_up_process for the task.
> 
> This alone is not sufficient, the mdev driver can continue to
> establish new mappings until it's close_device function
> returns. Killing only existing mappings is racy.
> 
> I think you are focusing on the one issue I pointed at, as I said, I'm
> sure there are more ways than just close to abuse this functionality
> to deadlock the kernel.
> 
> I continue to prefer we remove it completely and do something more
> robust. I suggested two options.

It's not racy.  New pin requests also land in vfio_wait if any vaddr's have
been invalidated in any vfio_dma in the iommu.  See
  vfio_iommu_type1_pin_pages()
    if (iommu->vaddr_invalid_count)
      vfio_find_dma_valid()
        vfio_wait()

However, I will investigate saving a reference to the file object in the vfio_dma
(for mappings backed by a file) and using that to translate IOVA's.  I think that
will be easier to use than fork/CHANGE_MM/exec, and may even be easier to use
than VFIO_DMA_UNMAP_FLAG_VADDR.  To be continued.

- Steve



More information about the libvir-list mailing list