[libvirt] [PATCH v3 0/4] numa: describe sibling nodes distances
Jim Fehlig
jfehlig at suse.com
Fri Sep 1 21:13:22 UTC 2017
Hi Wim,
I'll be away for a few weeks and won't be able to review this in detail until
later in the month. I see Martin provided some feedback on patch1, which is
awesome since I'd prefer a broader agreement on that patch than my single 'ack'.
BTW, the new code in patch2 can also be tested now that we have domXML <->
libxl_domain_config conversion tests :-). See tests/libxlxml2domconfigtest.c
Regards,
Jim
On 08/31/2017 08:02 AM, Wim Ten Have wrote:
> From: Wim ten Have <wim.ten.have at oracle.com>
>
> This patch extents guest domain administration adding support to advertise
> node sibling distances when configuring HVM numa guests.
>
> NUMA (non-uniform memory access), a method of configuring a cluster of nodes
> within a single multiprocessing system such that it shares processor
> local memory amongst others improving performance and the ability of the
> system to be expanded.
>
> A NUMA system could be illustrated as shown below. Within this 4-node
> system, every socket is equipped with its own distinct memory. The whole
> typically resembles a SMP (symmetric multiprocessing) system being a
> "tightly-coupled," "share everything" system in which multiple processors
> are working under a single operating system and can access each others'
> memory over multiple "Bus Interconnect" paths.
>
> +-----+-----+-----+ +-----+-----+-----+
> | M | CPU | CPU | | CPU | CPU | M |
> | E | | | | | | E |
> | M +- Socket0 -+ +- Socket3 -+ M |
> | O | | | | | | O |
> | R | CPU | CPU <---------> CPU | CPU | R |
> | Y | | | | | | Y |
> +-----+--^--+-----+ +-----+--^--+-----+
> | |
> | Bus Interconnect |
> | |
> +-----+--v--+-----+ +-----+--v--+-----+
> | M | | | | | | M |
> | E | CPU | CPU <---------> CPU | CPU | E |
> | M | | | | | | M |
> | O +- Socket1 -+ +- Socket2 -+ O |
> | R | | | | | | R |
> | Y | CPU | CPU | | CPU | CPU | Y |
> +-----+-----+-----+ +-----+-----+-----+
>
> In contrast there is the limitation of a flat SMP system, not illustrated.
> Here, as sockets are added, the bus (data and address path), under high
> activity, gets overloaded and easily becomes a performance bottleneck.
> NUMA adds an intermediate level of memory shared amongst a few cores per
> socket as illustrated above, so that data accesses do not have to travel
> over a single bus.
>
> Unfortunately the way NUMA does this adds its own limitations. This,
> as visualized in the illustration above, happens when data is stored in
> memory associated with Socket2 and is accessed by a CPU (core) in Socket0.
> The processors use the "Bus Interconnect" to create gateways between the
> sockets (nodes) enabling inter-socket access to memory. These "Bus
> Interconnect" hops add data access delays when a CPU (core) accesses
> memory associated with a remote socket (node).
>
> For terminology we refer to sockets as "nodes" where access to each
> others' distinct resources such as memory make them "siblings" with a
> designated "distance" between them. A specific design is described under
> the ACPI (Advanced Configuration and Power Interface Specification)
> within the chapter explaining the system's SLIT (System Locality Distance
> Information Table).
>
> These patches extend core libvirt's XML description of a virtual machine's
> hardware to include NUMA distance information for sibling nodes, which
> is then passed to Xen guests via libxl. Recently qemu landed support for
> constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
> distance for different NUMA nodes"), hence these core libvirt extensions
> can also help other drivers in supporting this feature.
>
> The XML changes made allow to describe the <cell> (or node/sockets) <distances>
> amongst <sibling> node identifiers and propagate these towards the numa
> domain functionality finally adding support to libxl.
>
> [below is an example illustrating a 4 node/socket <cell> setup]
>
> <cpu>
> <numa>
> <cell id='0' cpus='0,4-7' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='10'/>
> <sibling id='1' value='21'/>
> <sibling id='2' value='31'/>
> <sibling id='3' value='41'/>
> </distances>
> </cell>
> <cell id='1' cpus='1,8-10,12-15' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='21'/>
> <sibling id='1' value='10'/>
> <sibling id='2' value='21'/>
> <sibling id='3' value='31'/>
> </distances>
> </cell>
> <cell id='2' cpus='2,11' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='31'/>
> <sibling id='1' value='21'/>
> <sibling id='2' value='10'/>
> <sibling id='3' value='21'/>
> </distances>
> </cell>
> <cell id='3' cpus='3' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='41'/>
> <sibling id='1' value='31'/>
> <sibling id='2' value='21'/>
> <sibling id='3' value='10'/>
> </distances>
> </cell>
> </numa>
> </cpu>
>
> By default on libxl, if no <distances> are given to describe the SLIT data
> between different <cell>s, this patch will default to a scheme using 10
> for local and 21 for any remote node/socket, which is the assumption of
> guest OS when no SLIT is specified. While SLIT is optional, libxl requires
> that distances are set nonetheless.
>
> On Linux systems the SLIT detail can be listed with help of the 'numactl -H'
> command. An above HVM guest as described would on such prompt with below output.
>
> [root at f25 ~]# numactl -H
> available: 4 nodes (0-3)
> node 0 cpus: 0 4 5 6 7
> node 0 size: 1988 MB
> node 0 free: 1743 MB
> node 1 cpus: 1 8 9 10 12 13 14 15
> node 1 size: 1946 MB
> node 1 free: 1885 MB
> node 2 cpus: 2 11
> node 2 size: 2011 MB
> node 2 free: 1912 MB
> node 3 cpus: 3
> node 3 size: 2010 MB
> node 3 free: 1980 MB
> node distances:
> node 0 1 2 3
> 0: 10 21 31 41
> 1: 21 10 21 31
> 2: 31 21 10 21
> 3: 41 31 21 10
>
> Wim ten Have (4):
> numa: describe siblings distances within cells
> libxl: vnuma support
> xenconfig: add domxml conversions for xen-xl
> xlconfigtest: add tests for numa cell sibling distances
>
> docs/formatdomain.html.in | 70 ++++-
> docs/schemas/basictypes.rng | 9 +
> docs/schemas/cputypes.rng | 18 ++
> src/conf/cpu_conf.c | 2 +-
> src/conf/numa_conf.c | 323 +++++++++++++++++++-
> src/conf/numa_conf.h | 25 +-
> src/libvirt_private.syms | 6 +
> src/libxl/libxl_conf.c | 120 ++++++++
> src/libxl/libxl_driver.c | 3 +-
> src/xenconfig/xen_xl.c | 333 +++++++++++++++++++++
> .../test-fullvirt-vnuma-nodistances.cfg | 26 ++
> .../test-fullvirt-vnuma-nodistances.xml | 53 ++++
> tests/xlconfigdata/test-fullvirt-vnuma.cfg | 26 ++
> tests/xlconfigdata/test-fullvirt-vnuma.xml | 81 +++++
> tests/xlconfigtest.c | 4 +
> 15 files changed, 1089 insertions(+), 10 deletions(-)
> create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.cfg
> create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.xml
> create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.cfg
> create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.xml
>
More information about the libvir-list
mailing list