[libvirt] [PATCH] qemu: process: Fix automatic setting of locked memory limit for VFIO

Thu Nov 5 00:16:53 UTC 2015

On Wed, 2015-11-04 at 16:54 +0100, Peter Krempa wrote:
> On Wed, Nov 04, 2015 at 08:43:34 -0700, Alex Williamson wrote:
> > On Wed, 2015-11-04 at 16:14 +0100, Peter Krempa wrote:
> > > For VFIO passthrough the guest memory including the device memory to be
> > > resident in memory. Previously we used a magic constant of 1GiB that we
> > > added to the current memory size of the VM to accomodate all this. It is
> > > possible though that the device that is passed through exposes more than
> > > that. To avoid guessing wrong again in the future set the memory lock
> > > limit to unlimited at the point when VFIO will be used.
> > > 
> > > This problem is similar to the issue where we tried to infer the hard
> > > limit for a VM according to it's configuration. This proved to be really
> > > problematic so we moved this burden to the user.
> > > 
> > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1273480
> > > 
> > > Additionally this patch also fixes a similar bug, where the mlock limit
> > > was not increased prior to a memory hotplug operation.
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1273491
> > > ---
> > 
> > IOW, tracking how much memory a VM should be able to lock is too hard,
> > let's circumvent the security that the kernel is trying to add here and
> > let assigned device VMs again lock as much memory as they want.  This
> > may solve some bugs, but it does so by ignoring the security limits
> > we're trying to impose.  Thanks,
> 
> Well, the default here is 64KiB, which obviously is not enough in this
> case and we are trying to set something reasonable so that it will not
> break. Other option would be to force the users to specify <hard_limit>
> so that they are hold responsible for the value.
> 
> So the actual question here is: Is there a 100% working
> alogrithm/formula that will allow us to calculate the locked memory
> size any point/configuration? I'll happily do something that.
> 
> We tried to do a similar thing to automatically infer the hard limit
> size. There were many bugs resultin of this since we weren't able to
> accurately guess qemu's memory usage.
> 
> Additionally if users wish to impose a limit on this they still might
> want to use the <hard_limit> setting.

What's wrong with the current algorithm?  The 1G fudge factor certainly
isn't ideal, but the 2nd bug you reference above is clearly a result
that more memory is being added to the VM but the locked memory limit is
not adjusted to account for it.  That's just an implementation
oversight.  I'm not sure what's going on in the first bug, but why does
using hard_limit to override the locked limit to something smaller than
we think it should be set to automatically solve the problem?  Is it not
getting set as we expect on power?  Do we simply need to set the limit
using max memory rather than current memory?  It seems like there's a
whole lot of things we could do that are better than allowing the VM
unlimited locked memory.  Thanks,

Alex