[libvirt][RFC PATCH] add a new 'default' option for attribute mode in numatune

Martin Kletzander mkletzan at redhat.com
Mon Aug 3 11:00:55 UTC 2020


On Mon, Aug 03, 2020 at 05:31:56PM +0800, Luyao Zhong wrote:
>Hi Libvirt experts,
>
>I would like enhence the numatune snippet configuration. Given a example snippet:
>
><domain>
>  ...
>  <numatune>
>    <memory mode="strict" nodeset="1-4,^3"/>
>    <memnode cellid="0" mode="strict" nodeset="1"/>
>    <memnode cellid="2" mode="preferred" nodeset="2"/>
>  </numatune>
>  ...
></domain>
>
>Currently, attribute mode is either 'interleave', 'strict', or 'preferred',
>I propose to add a new 'default'  option. I give the reason as following.
>
>Presume we are using cgroups v1, Libvirt sets cpuset.mems for all vcpu threads
>according to 'nodeset' in memory element. And translate the memnode element to
>qemu config options (--object memory-backend-ram) for per numa cell, which
>invoking mbind() system call at the end.[1]
>
>But what if we want using default memory policy and request each guest numa cell
>pinned to different host memory nodes? We can't use mbind via qemu config options,
>because (I quoto here) "For MPOL_DEFAULT, the nodemask and maxnode arguments must
>be specify the empty set of nodes." [2]
>
>So my solution is introducing a new 'default' option for attribute mode. e.g.
>
><domain>
>  ...
>  <numatune>
>    <memory mode="default" nodeset="1-2"/>
>    <memnode cellid="0" mode="default" nodeset="1"/>
>    <memnode cellid="1" mode="default" nodeset="2"/>
>  </numatune>
>  ...
></domain>
>
>If the mode is 'default', libvirt should avoid generating qemu command line
>'--object memory-backend-ram', and invokes cgroups to set cpuset.mems for per guest numa
>combining with numa topology config. Presume the numa topology is :
>
><cpu>
>  ...
>  <numa>
>    <cell id='0' cpus='0-3' memory='512000' unit='KiB' />
>    <cell id='1' cpus='4-7' memory='512000' unit='KiB' />
>  </numa>
>  ...
></cpu>
>
>Then libvirt should set cpuset.mems to '1' for vcpus 0-3, and '2' for vcpus 4-7.
>
>
>Is this reasonable and feasible? Welcome any comments.
>

There are couple of problems here.  The memory is not (always) allocated by the
vCPU threads.  I also remember it to not be allocated by the process, but in KVM
in a way that was not affected by the cgroup settings.  That might be fixed now,
however.

But basically what we have against is all the reasons why we started using
QEMU's command line arguments for all that.

Sorry, but I think it will more likely break rather than fix stuff.  Maybe this
could be dealt with by a switch in `qemu.conf` with a huge warning above it.

Have a nice day,
Martin

>Regards,
>Luyao
>
>[1]https://github.com/qemu/qemu/blob/f2a1cf9180f63e88bb38ff21c169da97c3f2bad5/backends/hostmem.c#L379
>[2]https://man7.org/linux/man-pages/man2/mbind.2.html
>
>-- 
>2.25.1
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20200803/954b90fa/attachment-0001.sig>


More information about the libvir-list mailing list