[libvirt] [PATCH v3 0/4] numa: describe sibling nodes distances

Jim Fehlig jfehlig at suse.com
Fri Sep 1 21:13:22 UTC 2017


Hi Wim,

I'll be away for a few weeks and won't be able to review this in detail until 
later in the month. I see Martin provided some feedback on patch1, which is 
awesome since I'd prefer a broader agreement on that patch than my single 'ack'.

BTW, the new code in patch2 can also be tested now that we have domXML <-> 
libxl_domain_config conversion tests :-). See tests/libxlxml2domconfigtest.c

Regards,
Jim

On 08/31/2017 08:02 AM, Wim Ten Have wrote:
> From: Wim ten Have <wim.ten.have at oracle.com>
> 
> This patch extents guest domain administration adding support to advertise
> node sibling distances when configuring HVM numa guests.
> 
> NUMA (non-uniform memory access), a method of configuring a cluster of nodes
> within a single multiprocessing system such that it shares processor
> local memory amongst others improving performance and the ability of the
> system to be expanded.
> 
> A NUMA system could be illustrated as shown below. Within this 4-node
> system, every socket is equipped with its own distinct memory. The whole
> typically resembles a SMP (symmetric multiprocessing) system being a
> "tightly-coupled," "share everything" system in which multiple processors
> are working under a single operating system and can access each others'
> memory over multiple "Bus Interconnect" paths.
> 
>          +-----+-----+-----+         +-----+-----+-----+
>          |  M  | CPU | CPU |         | CPU | CPU |  M  |
>          |  E  |     |     |         |     |     |  E  |
>          |  M  +- Socket0 -+         +- Socket3 -+  M  |
>          |  O  |     |     |         |     |     |  O  |
>          |  R  | CPU | CPU <---------> CPU | CPU |  R  |
>          |  Y  |     |     |         |     |     |  Y  |
>          +-----+--^--+-----+         +-----+--^--+-----+
>                   |                           |
>                   |      Bus Interconnect     |
>                   |                           |
>          +-----+--v--+-----+         +-----+--v--+-----+
>          |  M  |     |     |         |     |     |  M  |
>          |  E  | CPU | CPU <---------> CPU | CPU |  E  |
>          |  M  |     |     |         |     |     |  M  |
>          |  O  +- Socket1 -+         +- Socket2 -+  O  |
>          |  R  |     |     |         |     |     |  R  |
>          |  Y  | CPU | CPU |         | CPU | CPU |  Y  |
>          +-----+-----+-----+         +-----+-----+-----+
> 
> In contrast there is the limitation of a flat SMP system, not illustrated.
> Here, as sockets are added, the bus (data and address path), under high
> activity, gets overloaded and easily becomes a performance bottleneck.
> NUMA adds an intermediate level of memory shared amongst a few cores per
> socket as illustrated above, so that data accesses do not have to travel
> over a single bus.
> 
> Unfortunately the way NUMA does this adds its own limitations. This,
> as visualized in the illustration above, happens when data is stored in
> memory associated with Socket2 and is accessed by a CPU (core) in Socket0.
> The processors use the "Bus Interconnect" to create gateways between the
> sockets (nodes) enabling inter-socket access to memory. These "Bus
> Interconnect" hops add data access delays when a CPU (core) accesses
> memory associated with a remote socket (node).
> 
> For terminology we refer to sockets as "nodes" where access to each
> others' distinct resources such as memory make them "siblings" with a
> designated "distance" between them.  A specific design is described under
> the ACPI (Advanced Configuration and Power Interface Specification)
> within the chapter explaining the system's SLIT (System Locality Distance
> Information Table).
> 
> These patches extend core libvirt's XML description of a virtual machine's
> hardware to include NUMA distance information for sibling nodes, which
> is then passed to Xen guests via libxl. Recently qemu landed support for
> constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
> distance for different NUMA nodes"), hence these core libvirt extensions
> can also help other drivers in supporting this feature.
> 
> The XML changes made allow to describe the <cell> (or node/sockets) <distances>
> amongst <sibling> node identifiers and propagate these towards the numa
> domain functionality finally adding support to libxl.
> 
> [below is an example illustrating a 4 node/socket <cell> setup]
> 
>      <cpu>
>        <numa>
>          <cell id='0' cpus='0,4-7' memory='2097152' unit='KiB'>
>            <distances>
>              <sibling id='0' value='10'/>
>              <sibling id='1' value='21'/>
>              <sibling id='2' value='31'/>
>              <sibling id='3' value='41'/>
>            </distances>
>          </cell>
>          <cell id='1' cpus='1,8-10,12-15' memory='2097152' unit='KiB'>
>            <distances>
>              <sibling id='0' value='21'/>
>              <sibling id='1' value='10'/>
>              <sibling id='2' value='21'/>
>              <sibling id='3' value='31'/>
>            </distances>
>          </cell>
>          <cell id='2' cpus='2,11' memory='2097152' unit='KiB'>
>            <distances>
>              <sibling id='0' value='31'/>
>              <sibling id='1' value='21'/>
>              <sibling id='2' value='10'/>
>              <sibling id='3' value='21'/>
>            </distances>
>          </cell>
>          <cell id='3' cpus='3' memory='2097152' unit='KiB'>
>            <distances>
>              <sibling id='0' value='41'/>
>              <sibling id='1' value='31'/>
>              <sibling id='2' value='21'/>
>              <sibling id='3' value='10'/>
>            </distances>
>          </cell>
>        </numa>
>      </cpu>
> 
> By default on libxl, if no <distances> are given to describe the SLIT data
> between different <cell>s, this patch will default to a scheme using 10
> for local and 21 for any remote node/socket, which is the assumption of
> guest OS when no SLIT is specified. While SLIT is optional, libxl requires
> that distances are set nonetheless.
> 
> On Linux systems the SLIT detail can be listed with help of the 'numactl -H'
> command. An above HVM guest as described would on such prompt with below output.
> 
>      [root at f25 ~]# numactl -H
>      available: 4 nodes (0-3)
>      node 0 cpus: 0 4 5 6 7
>      node 0 size: 1988 MB
>      node 0 free: 1743 MB
>      node 1 cpus: 1 8 9 10 12 13 14 15
>      node 1 size: 1946 MB
>      node 1 free: 1885 MB
>      node 2 cpus: 2 11
>      node 2 size: 2011 MB
>      node 2 free: 1912 MB
>      node 3 cpus: 3
>      node 3 size: 2010 MB
>      node 3 free: 1980 MB
>      node distances:
>      node   0   1   2   3
>        0:  10  21  31  41
>        1:  21  10  21  31
>        2:  31  21  10  21
>        3:  41  31  21  10
> 
> Wim ten Have (4):
>    numa: describe siblings distances within cells
>    libxl: vnuma support
>    xenconfig: add domxml conversions for xen-xl
>    xlconfigtest: add tests for numa cell sibling distances
> 
>   docs/formatdomain.html.in                          |  70 ++++-
>   docs/schemas/basictypes.rng                        |   9 +
>   docs/schemas/cputypes.rng                          |  18 ++
>   src/conf/cpu_conf.c                                |   2 +-
>   src/conf/numa_conf.c                               | 323 +++++++++++++++++++-
>   src/conf/numa_conf.h                               |  25 +-
>   src/libvirt_private.syms                           |   6 +
>   src/libxl/libxl_conf.c                             | 120 ++++++++
>   src/libxl/libxl_driver.c                           |   3 +-
>   src/xenconfig/xen_xl.c                             | 333 +++++++++++++++++++++
>   .../test-fullvirt-vnuma-nodistances.cfg            |  26 ++
>   .../test-fullvirt-vnuma-nodistances.xml            |  53 ++++
>   tests/xlconfigdata/test-fullvirt-vnuma.cfg         |  26 ++
>   tests/xlconfigdata/test-fullvirt-vnuma.xml         |  81 +++++
>   tests/xlconfigtest.c                               |   4 +
>   15 files changed, 1089 insertions(+), 10 deletions(-)
>   create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.cfg
>   create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.xml
>   create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.cfg
>   create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.xml
> 




More information about the libvir-list mailing list