[PATCH RFC v2 00/13] IOMMUFD Generic interface
Steven Sistare
steven.sistare at oracle.com
Wed Sep 21 19:30:55 UTC 2022
On 9/21/2022 2:44 PM, Jason Gunthorpe wrote:
> On Wed, Sep 21, 2022 at 12:06:49PM -0600, Alex Williamson wrote:
>
>>> I still think the compat gaps are small. I've realized that
>>> VFIO_DMA_UNMAP_FLAG_VADDR has no implementation in qemu, and since it
>>> can deadlock the kernel I propose we purge it completely.
>>
>> Steve won't be happy to hear that, QEMU support exists but isn't yet
>> merged.
"unhappy" barely scratches the surface of my feelings!
Live update is a great feature that solves a real problem, and lots of
people have spent lots of time providing thorough feedback on the qemu
patches I have submitted. We *will* cross the finish line. In the
mean time, I maintain a patched qemu for use in my company, and I have
heard from others who do the same.
> If Steve wants to keep it then someone needs to fix the deadlock in
> the vfio implementation before any userspace starts to appear.
The only VFIO_DMA_UNMAP_FLAG_VADDR issue I am aware of is broken pinned accounting
across exec, which can result in mm->locked_vm becoming negative. I have several
fixes, but none result in limits being reached at exactly the same time as before --
the same general issue being discussed for iommufd. I am still thinking about it.
I am not aware of a deadlock problem. Please elaborate or point me to an
email thread.
> I can fix the deadlock in iommufd in a terrible expensive way, but
> would rather we design a better interface if nobody is using it yet. I
> advocate for passing the memfd to the kernel and use that as the page
> provider, not a mm_struct.
memfd support alone is not sufficient. Live update also supports guest ram
backed by named shared memory.
- Steve
>> The issue is where we account these pinned pages, where accounting is
>> necessary such that a user cannot lock an arbitrary number of pages
>> into RAM to generate a DoS attack.
>
> It is worth pointing out that preventing a DOS attack doesn't actually
> work because a *task* limit is trivially bypassed by just spawning
> more tasks. So, as a security feature, this is already very
> questionable.
>
> What we've done here is make the security feature work to actually
> prevent DOS attacks, which then gives you this problem:
>
>> This obviously has implications. AFAICT, any management tool that
>> doesn't instantiate assigned device VMs under separate users are
>> essentially untenable.
>
> Because now that the security feature works properly it detects the
> DOS created by spawning multiple tasks :(
>
> Somehow I was under the impression there was not user sharing in the
> common cases, but I guess I don't know that for sure.
>
>>> So, I still like 2 because it yields the smallest next step before we
>>> can bring all the parallel work onto the list, and it makes testing
>>> and converting non-qemu stuff easier even going forward.
>>
>> If a vfio compatible interface isn't transparently compatible, then I
>> have a hard time understanding its value. Please correct my above
>> description and implications, but I suspect these are not just
>> theoretical ABI compat issues. Thanks,
>
> Because it is just fine for everything that doesn't use the ulimit
> feature, which is still a lot of use cases!
>
> Remember, at this point we are not replacing /dev/vfio/vfio, this is
> just providing the general compat in a form that has to be opted
> into. I think if you open the /dev/iommu device node then you should
> get secured accounting.
>
> If /dev/vfio/vfio is provided by iommufd it may well have to trigger a
> different ulimit tracking - if that is the only sticking point it
> seems minor and should be addressed in some later series that adds
> /dev/vfio/vfio support to iommufd..
>
> Jason
More information about the libvir-list
mailing list