[PATCH 2/4] qemu: Start emulator thread with more generous cpuset.mems

Martin Kletzander mkletzan at redhat.com
Tue May 23 12:19:02 UTC 2023


On Tue, May 23, 2023 at 12:06:19PM +0200, Michal Privoznik wrote:
>Consider a domain with two guest NUMA nodes and the following
><numatune/> setting :
>
>  <numatune>
>    <memory mode="strict" nodeset="0"/>
>    <memnode cellid="0" mode="strict" nodeset="1"/>
>  </numatune>
>
>What this means is the emulator thread is pinned onto host NUMA
>node #0 (by setting corresponding cpuset.mems to "0"), and two
>memory-backend-* objects are created:
>
>  -object '{"qom-type":"memory-backend-ram","id":"ram-node0", .., "host-nodes":[1],"policy":"bind"}' \
>  -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \
>  -object '{"qom-type":"memory-backend-ram","id":"ram-node1", .., "host-nodes":[0],"policy":"bind"}' \
>  -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \
>
>Note, the emulator thread is pinned well before QEMU is even
>exec()-ed.
>
>Now, the way memory allocation works in QEMU is: the emulator
>thread calls mmap() followed by mbind() (which is sane, that's
>how everybody should do it). BUT, because the thread is already
>restricted by CGroups to just NUMA node #0, calling:
>
>  mbind(host-nodes:[1]); /* made up syntax (TM) */
>
>fails. This is expected though. Kernel was instructed to place
>the memory at NUMA node "0" and yet, process is trying to place
>it elsewhere.
>
>We used to solve this by not restricting emulator thread at all
>initially, and only after it's done initializing (i.e. we got the
>QMP greeting) we placed it onto desired nodes. But this had its
>own problems (e.g. QEMU might have locked pieces of its memory
>which were then unable to migrate onto different NUMA nodes).
>
>Therefore, in v5.1.0-rc1~282 we've changed this and set CGropus

*cgroups

>upfront (even before exec()-ing QEMU). And this used to work, but
>something has changed (I can't really put my finger anywhere).
>

I'm glad that's the case

>Therefore, for the initialization start the thread with union of
>all configured host NUMA nodes ("0-1" in our example) and fix the
>placement only after QEMU is started.
>
>NB, the memory hotplug suffers the same problem, but that might
>be fixed later.
>
>Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2138150
>Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>---
> src/qemu/qemu_domain.c  | 26 +++++++++++++++++++++++++
> src/qemu/qemu_domain.h  |  5 +++++
> src/qemu/qemu_process.c | 42 ++++++++++++++++++++++++++++++-----------
> 3 files changed, 62 insertions(+), 11 deletions(-)
>
>diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
>index 35d03515c7..d97a0ce24e 100644
>--- a/src/qemu/qemu_domain.c
>+++ b/src/qemu/qemu_domain.c
>@@ -12674,3 +12674,29 @@ qemuDomainEvaluateCPUMask(const virDomainDef *def,
>
>     return NULL;
> }
>+
>+
>+void
>+qemuDomainNumatuneMaybeFormatNodesetUnion(virDomainObj *vm,
>+                                          virBitmap **nodeset,
>+                                          char **nodesetStr)
>+{
>+    virDomainNuma *numatune = vm->def->numa;
>+    qemuDomainObjPrivate *priv = vm->privateData;
>+    g_autoptr(virBitmap) unionMask = virBitmapNew(0);
>+    ssize_t i;
>+
>+    for (i = -1; i < (ssize_t)virDomainNumaGetNodeCount(numatune); i++) {
>+        virBitmap *tmp;
>+
>+        tmp = virDomainNumatuneGetNodeset(numatune, priv->autoNodeset, i);
>+        if (tmp)
>+            virBitmapUnion(unionMask, tmp);
>+    }
>+
>+    if (nodesetStr)
>+        *nodesetStr = virBitmapFormat(unionMask);
>+
>+    if (nodeset)
>+        *nodeset = g_steal_pointer(&unionMask);
>+}
>diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
>index ec9ae75bce..999190e381 100644
>--- a/src/qemu/qemu_domain.h
>+++ b/src/qemu/qemu_domain.h
>@@ -1140,3 +1140,8 @@ virBitmap *
> qemuDomainEvaluateCPUMask(const virDomainDef *def,
>                           virBitmap *cpumask,
>                           virBitmap *autoCpuset);
>+
>+void
>+qemuDomainNumatuneMaybeFormatNodesetUnion(virDomainObj *vm,
>+                                          virBitmap **nodeset,
>+                                          char **nodesetStr);
>diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>index 57c3ea2dbf..6b85b7cee7 100644
>--- a/src/qemu/qemu_process.c
>+++ b/src/qemu/qemu_process.c
>@@ -2550,7 +2550,8 @@ qemuProcessSetupPid(virDomainObj *vm,
>                     virBitmap *cpumask,
>                     unsigned long long period,
>                     long long quota,
>-                    virDomainThreadSchedParam *sched)
>+                    virDomainThreadSchedParam *sched,
>+                    bool unionMems)
> {
>     qemuDomainObjPrivate *priv = vm->privateData;
>     virDomainNuma *numatune = vm->def->numa;
>@@ -2592,11 +2593,22 @@ qemuProcessSetupPid(virDomainObj *vm,
>
>         if (virDomainNumatuneGetMode(numatune, -1, &mem_mode) == 0 &&
>             (mem_mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT ||
>-             mem_mode == VIR_DOMAIN_NUMATUNE_MEM_RESTRICTIVE) &&
>-            virDomainNumatuneMaybeFormatNodeset(numatune,
>-                                                priv->autoNodeset,
>-                                                &mem_mask, -1) < 0)
>-            goto cleanup;
>+             mem_mode == VIR_DOMAIN_NUMATUNE_MEM_RESTRICTIVE)) {
>+
>+            /* QEMU allocates its memory from the emulator thread. Thus it
>+             * needs to access union of all host nodes configured. This is
>+             * going to be replaced with proper value later in the startup
>+             * process. */
>+            if (unionMems &&
>+                nameval == VIR_CGROUP_THREAD_EMULATOR) {
>+                qemuDomainNumatuneMaybeFormatNodesetUnion(vm, NULL, &mem_mask);

This should not be the case for VIR_DOMAIN_NUMATUNE_MEM_RESTRICTIVE as
that will not result in host-nodes being set.

With those fixes

Reviewed-by: Martin Kletzander <mkletzan at redhat.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20230523/8a24bccb/attachment.sig>


More information about the libvir-list mailing list