[PATCH 1/2] qemu_domain: Increase memlock limit for NVMe disks

Michal Privoznik mprivozn at redhat.com
Fri Apr 14 10:02:26 UTC 2023


When starting QEMU, or when hotplugging a PCI device QEMU might
lock some memory. How much? Well, that's a undecidable problem: a
Turing machine that halts, halts in an finite number of steps,
and thus it can move tape only so many times. Now, does given TM
halt? QED.

But despite that, we try to guess. And it more or less works,
until there's a counter example. This time, it's a guest with
both <hostdev/> and an NVMe <disk/>. I've started a simple guest
with 4GiB of memory:

  # virsh dominfo fedora
  Max memory:     4194304 KiB
  Used memory:    4194304 KiB

And here are the amounts of memory that QEMU tried to lock,
obtained via:

  grep VmLck /proc/$(pgrep qemu-kvm)/status

  1) with just one <hostdev/>
     VmLck:   4194308 kB

  2) with just one NVMe <disk/>
     VmLck:   4328544 kB

  3) with one <hostdev/> and one NVMe <disk/>
     VmLck:   8522852 kB

Now, what's surprising is case 2) where the locked memory exceeds
the VM memory. It almost resembles VDPA. Therefore, treat is as
such.

Unfortunately, I don't have a box with two or more spare NVMe-s
so I can't tell for sure. But setting limit too tight means QEMU
refuses to start.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2014030
Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
---
 src/qemu/qemu_domain.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 63b13b6875..41db98880c 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -9532,7 +9532,7 @@ getPPC64MemLockLimitBytes(virDomainDef *def,
 
 
 static int
-qemuDomainGetNumVFIODevices(const virDomainDef *def)
+qemuDomainGetNumVFIOHostdevs(const virDomainDef *def)
 {
     size_t i;
     int n = 0;
@@ -9542,10 +9542,22 @@ qemuDomainGetNumVFIODevices(const virDomainDef *def)
             virHostdevIsMdevDevice(def->hostdevs[i]))
             n++;
     }
+
+    return n;
+}
+
+
+static int
+qemuDomainGetNumNVMeDisks(const virDomainDef *def)
+{
+    size_t i;
+    int n = 0;
+
     for (i = 0; i < def->ndisks; i++) {
         if (virStorageSourceChainHasNVMe(def->disks[i]->src))
             n++;
     }
+
     return n;
 }
 
@@ -9585,6 +9597,7 @@ qemuDomainGetMemLockLimitBytes(virDomainDef *def,
 {
     unsigned long long memKB = 0;
     int nvfio;
+    int nnvme;
     int nvdpa;
 
     /* prefer the hard limit */
@@ -9604,7 +9617,8 @@ qemuDomainGetMemLockLimitBytes(virDomainDef *def,
     if (ARCH_IS_PPC64(def->os.arch) && def->virtType == VIR_DOMAIN_VIRT_KVM)
         return getPPC64MemLockLimitBytes(def, forceVFIO);
 
-    nvfio = qemuDomainGetNumVFIODevices(def);
+    nvfio = qemuDomainGetNumVFIOHostdevs(def);
+    nnvme = qemuDomainGetNumNVMeDisks(def);
     nvdpa = qemuDomainGetNumVDPANetDevices(def);
     /* For device passthrough using VFIO the guest memory and MMIO memory
      * regions need to be locked persistent in order to allow DMA.
@@ -9624,16 +9638,17 @@ qemuDomainGetMemLockLimitBytes(virDomainDef *def,
      *
      * Note that this may not be valid for all platforms.
      */
-    if (forceVFIO || nvfio || nvdpa) {
+    if (forceVFIO || nvfio || nnvme || nvdpa) {
         /* At present, the full memory needs to be locked for each VFIO / VDPA
-         * device. For VFIO devices, this only applies when there is a vIOMMU
-         * present. Yes, this may result in a memory limit that is greater than
-         * the host physical memory, which is not ideal. The long-term solution
-         * is a new userspace iommu interface (iommufd) which should eliminate
-         * this duplicate memory accounting. But for now this is the only way
-         * to enable configurations with e.g. multiple vdpa devices.
+         * NVMe device. For VFIO devices, this only applies when there is a
+         * vIOMMU present. Yes, this may result in a memory limit that is
+         * greater than the host physical memory, which is not ideal. The
+         * long-term solution is a new userspace iommu interface (iommufd)
+         * which should eliminate this duplicate memory accounting. But for now
+         * this is the only way to enable configurations with e.g. multiple
+         * VDPA/NVMe devices.
          */
-        int factor = nvdpa;
+        int factor = nvdpa + nnvme;
 
         if (nvfio || forceVFIO) {
             if (nvfio && def->iommu)
-- 
2.39.2



More information about the libvir-list mailing list