[libvirt] [PATCH v3 1/4] numa: describe siblings distances within cells
Martin Kletzander
mkletzan at redhat.com
Thu Aug 31 14:36:58 UTC 2017
On Thu, Aug 31, 2017 at 04:02:38PM +0200, Wim Ten Have wrote:
>From: Wim ten Have <wim.ten.have at oracle.com>
>
I haven't seen the previous versions, so sorry if I point out something
that got changed already.
>Add libvirtd NUMA cell domain administration functionality to
>describe underlying cell id sibling distances in full fashion
>when configuring HVM guests.
>
>[below is an example of a 4 node setup]
>
> <cpu>
> <numa>
> <cell id='0' cpus='0' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='10'/>
> <sibling id='1' value='21'/>
> <sibling id='2' value='31'/>
> <sibling id='3' value='41'/>
> </distances>
> </cell>
> <cell id='1' cpus='1' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='21'/>
> <sibling id='1' value='10'/>
> <sibling id='2' value='31'/>
> <sibling id='3' value='41'/>
> </distances>
> </cell>
> <cell id='2' cpus='2' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='31'/>
> <sibling id='1' value='21'/>
> <sibling id='2' value='10'/>
> <sibling id='3' value='21'/>
> </distances>
> <cell id='3' cpus='3' memory='2097152' unit='KiB'>
> <distances>
> <sibling id='0' value='41'/>
> <sibling id='1' value='31'/>
> <sibling id='2' value='21'/>
> <sibling id='3' value='10'/>
> </distances>
> </cell>
> </numa>
> </cpu>
>
Are all those required? I don't see the point in requiring duplicate
values.
>Changes under this commit concern all those that require adding
>the valid data structures, virDomainNuma* functional routines and the
>XML doc schema enhancements to enforce appropriate administration.
>
I don't understand the point of this paragraph.
>These changes are accompanied with topic related documentation under
>docs/formatdomain.html within the "CPU model and topology" extending the
>"Guest NUMA topology" paragraph.
>
This can be seen from the patch.
>For terminology we refer to sockets as "nodes" where access to each
>others' distinct resources such as memory make them "siblings" with a
>designated "distance" between them. A specific design is described under
>the ACPI (Advanced Configuration and Power Interface Specification)
>within the chapter explaining the system's SLIT (System Locality Distance
>Information Table).
>
The above paragraph is also being added in the docs.
>These patches extend core libvirt's XML description of a virtual machine's
>hardware to include NUMA distance information for sibling nodes, which
>is then passed to Xen guests via libxl. Recently qemu landed support for
>constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
>distance for different NUMA nodes"), hence these core libvirt extensions
>can also help other drivers in supporting this feature.
>
This is not true, libxl changes are in separate patch. There's a lot of
copy-paste from the cover-letter...
>These changes alter the docs/schemas/cputypes.rng enforcing
>domain administration to follow the syntax below per numa cell id.
>
>These changes also alter docs/schemas/basictypes.rng to add
>"numaDistanceValue" which is an "unsignedInt" with a minimum value
>of 10 as 0-9 are reserved values and can not be used as System
>Locality Distance Information Table data.
>
>Signed-off-by: Wim ten Have <wim.ten.have at oracle.com>
>---
>Changes on v1:
>- Add changes to docs/formatdomain.html.in describing schema update.
>Changes on v2:
>- Automatically apply distance symmetry maintaining cell <-> sibling.
>- Check for maximum '255' on numaDistanceValue.
>- Automatically complete empty distance ranges.
>- Check that sibling_id's are in range with cell identifiers.
>- Allow non-contiguous ranges, starting from any node id.
>- Respect parameters as ATTRIBUTE_NONNULL fix functions and callers.
>- Add and apply topoly for LOCAL_DISTANCE=10 and REMOTE_DISTANCE=20.
>---
> docs/formatdomain.html.in | 70 +++++++++-
> docs/schemas/basictypes.rng | 9 ++
> docs/schemas/cputypes.rng | 18 +++
> src/conf/cpu_conf.c | 2 +-
> src/conf/numa_conf.c | 323 +++++++++++++++++++++++++++++++++++++++++++-
> src/conf/numa_conf.h | 25 +++-
> src/libvirt_private.syms | 6 +
> 7 files changed, 444 insertions(+), 9 deletions(-)
>
>diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
>index 8ca7637..19a2ac7 100644
>--- a/docs/formatdomain.html.in
>+++ b/docs/formatdomain.html.in
>@@ -1529,7 +1529,75 @@
> </p>
>
> <p>
>- This guest NUMA specification is currently available only for QEMU/KVM.
>+ This guest NUMA specification is currently available only for
>+ QEMU/KVM and Xen. Whereas Xen driver also allows for a distinct
>+ description of NUMA arranged <code>sibling</code> <code>cell</code>
>+ <code>distances</code> <span class="since">Since 3.6.0</span>.
>+ </p>
>+
>+ <p>
>+ Under NUMA h/w architecture, distinct resources such as memory
>+ create a designated distance between <code>cell</code> and
>+ <code>siblings</code> that now can be described with the help of
>+ <code>distances</code>. A detailed describtion can be found within
>+ the ACPI (Advanced Configuration and Power Interface Specification)
>+ within the chapter explaining the system's SLIT (System Locality
>+ Distance Information Table).
>+ </p>
>+
>+<pre>
>+...
>+<cpu>
>+ ...
>+ <numa>
>+ <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'>
>+ <distances>
>+ <sibling id='0' value='10'/>
>+ <sibling id='1' value='21'/>
>+ <sibling id='2' value='31'/>
>+ <sibling id='3' value='41'/>
>+ </distances>
>+ </cell>
>+ <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'>
>+ <distances>
>+ <sibling id='0' value='21'/>
>+ <sibling id='1' value='10'/>
>+ <sibling id='2' value='21'/>
>+ <sibling id='3' value='31'/>
>+ </distances>
>+ </cell>
>+ <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'>
>+ <distances>
>+ <sibling id='0' value='31'/>
>+ <sibling id='1' value='21'/>
>+ <sibling id='2' value='10'/>
>+ <sibling id='3' value='21'/>
>+ </distances>
>+ </cell>
>+ <cell id='3' cpus='3' memory='512000' unit='KiB'>
>+ <distances>
>+ <sibling id='0' value='41'/>
>+ <sibling id='1' value='31'/>
>+ <sibling id='2' value='21'/>
>+ <sibling id='3' value='10'/>
>+ </distances>
>+ </cell>
>+ </numa>
>+ ...
>+</cpu>
>+...</pre>
>+
>+ <p>
>+ Under Xen driver, if no <code>distances</code> are given to describe
>+ the SLIT data between different cells, it will default to a scheme
>+ using 10 for local and 20 for remote distances. Made defaults
>+ for guest OS when no SLIT is specified.
>+ <br/>See include/linux/topology.h under
>+ <pre>
>+ /* Conform to ACPI 2.0 SLIT distance definitions */
>+ #define LOCAL_DISTANCE 10
>+ #define REMOTE_DISTANCE 20
>+ </pre>
I wouldn't include the code in the <pre>, I don't see the point for that.
> </p>
>
> <h3><a id="elementsEvents">Events configuration</a></h3>
>diff --git a/docs/schemas/basictypes.rng b/docs/schemas/basictypes.rng
>index 1ea667c..bbef282 100644
>--- a/docs/schemas/basictypes.rng
>+++ b/docs/schemas/basictypes.rng
>@@ -77,6 +77,15 @@
> </choice>
> </define>
>
>+ <define name="numaDistanceValue">
>+ <choice>
>+ <data type="unsignedInt">
>+ <param name="minInclusive">10</param>
>+ <param name="maxInclusive">255</param>
>+ </data>
>+ </choice>
Why <choice>?
>+ </define>
>+
> <define name="pciaddress">
> <optional>
> <attribute name="domain">
>diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng
>index 3eef16a..c45b6df 100644
>--- a/docs/schemas/cputypes.rng
>+++ b/docs/schemas/cputypes.rng
>@@ -129,6 +129,24 @@
> </choice>
> </attribute>
> </optional>
>+ <optional>
>+ <element name="distances">
>+ <oneOrMore>
>+ <ref name="numaDistance"/>
>+ </oneOrMore>
>+ </element>
>+ </optional>
>+ </element>
>+ </define>
>+
>+ <define name="numaDistance">
>+ <element name="sibling">
>+ <attribute name="id">
>+ <ref name="unsignedInt"/>
>+ </attribute>
>+ <attribute name="value">
>+ <ref name="numaDistanceValue"/>
>+ </attribute>
> </element>
> </define>
>
>diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c
>index c21d11d..8d804a1 100644
>--- a/src/conf/cpu_conf.c
>+++ b/src/conf/cpu_conf.c
>@@ -642,7 +642,7 @@ virCPUDefFormatBufFull(virBufferPtr buf,
> if (virCPUDefFormatBuf(&childrenBuf, def, updateCPU) < 0)
> goto cleanup;
>
>- if (virDomainNumaDefCPUFormat(&childrenBuf, numa) < 0)
>+ if (virDomainNumaDefCPUFormatXML(&childrenBuf, numa) < 0)
> goto cleanup;
>
Changing function names should be separate patch. Why is this
changed anyway?
> if (virBufferCheckError(&attributeBuf) < 0 ||
>diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c
>index bfd3703..fb2a74c 100644
>--- a/src/conf/numa_conf.c
>+++ b/src/conf/numa_conf.c
>@@ -29,6 +29,13 @@
> #include "virnuma.h"
> #include "virstring.h"
>
>+/*
>+ * Distance definitions defined Conform ACPI 2.0 SLIT.
>+ * See include/linux/topology.h
>+ */
>+#define LOCAL_DISTANCE 10
>+#define REMOTE_DISTANCE 20
>+
> #define VIR_FROM_THIS VIR_FROM_DOMAIN
>
> VIR_ENUM_IMPL(virDomainNumatuneMemMode,
>@@ -48,6 +55,8 @@ VIR_ENUM_IMPL(virDomainMemoryAccess, VIR_DOMAIN_MEMORY_ACCESS_LAST,
> "shared",
> "private")
>
>+typedef struct _virDomainNumaDistance virDomainNumaDistance;
>+typedef virDomainNumaDistance *virDomainNumaDistancePtr;
>
> typedef struct _virDomainNumaNode virDomainNumaNode;
> typedef virDomainNumaNode *virDomainNumaNodePtr;
>@@ -66,6 +75,12 @@ struct _virDomainNuma {
> virBitmapPtr nodeset; /* host memory nodes where this guest node resides */
> virDomainNumatuneMemMode mode; /* memory mode selection */
> virDomainMemoryAccess memAccess; /* shared memory access configuration */
>+
>+ struct _virDomainNumaDistance {
>+ unsigned int value; /* locality value for node i->j or j->i */
>+ unsigned int cellid;
>+ } *distances; /* remote node distances */
>+ size_t ndistances;
> } *mem_nodes; /* guest node configuration */
> size_t nmem_nodes;
>
>@@ -686,6 +701,128 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune,
> }
>
>
>+static int
>+virDomainNumaDefNodeDistanceParseXML(virDomainNumaPtr def,
>+ xmlXPathContextPtr ctxt,
>+ unsigned int cur_cell)
>+{
>+ int ret = -1;
>+ int sibling;
>+ char *tmp = NULL;
>+ xmlNodePtr *nodes = NULL;
>+ size_t i, ndistances = def->nmem_nodes;
>+
>+ if (!ndistances)
>+ return 0;
>+
>+ if (!virXPathNode("./distances[1]/sibling", ctxt))
>+ return 0;
>+
>+ if ((sibling = virXPathNodeSet("./distances[1]/sibling", ctxt, &nodes)) <= 0) {
>+ virReportError(VIR_ERR_XML_ERROR, "%s",
>+ _("NUMA distances defined without siblings"));
>+ goto cleanup;
>+ }
>+
Aren't these two blocks of code checking the same thing?
>+ for (i = 0; i < sibling; i++) {
>+ virDomainNumaDistancePtr ldist, rdist;
>+ unsigned int sibling_id, sibling_value;
>+
>+ /* siblings are in order of parsing or explicitly numbered */
>+ if ((tmp = virXMLPropString(nodes[i], "id"))) {
>+ if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) {
>+ virReportError(VIR_ERR_XML_ERROR,
>+ _("Invalid 'id' attribute in NUMA distances for sibling: '%s'"),
>+ tmp);
>+ goto cleanup;
>+ }
>+ }
>+ VIR_FREE(tmp);
>+
>+ if (sibling_id >= ndistances) {
Isn't sibling_id uninitialized in case there is no 'id' specified?
>+ virReportError(VIR_ERR_XML_ERROR,
>+ _("No cell administration for 'sibling_id %d' under NUMA cell '%d' "),
>+ sibling_id, cur_cell);
>+ goto cleanup;
>+ }
>+
>+ /* We need a locality value. Check and correct
>+ * distance to local and distance to remote node.
>+ */
>+ if (!(tmp = virXMLPropString(nodes[i], "value"))) {
>+ virReportError(VIR_ERR_XML_ERROR,
>+ _("Missing 'value' attribute in NUMA distances for sibling id: '%d'"),
>+ sibling_id);
>+ goto cleanup;
>+ }
>+
>+ /* It needs to be applicable */
>+ if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) {
>+ virReportError(VIR_ERR_XML_ERROR,
>+ _("Invalid 'value' attribute in NUMA distances for sibling id: '%d'"),
>+ sibling_id);
>+ goto cleanup;
>+ }
>+ VIR_FREE(tmp);
>+
>+ ldist = def->mem_nodes[cur_cell].distances;
>+ if (!ldist) {
>+ if (def->mem_nodes[cur_cell].ndistances) {
>+ virReportError(VIR_ERR_XML_ERROR,
>+ _("Erratic 'ndistances' set in NUMA distances for sibling id: '%d'"),
>+ cur_cell);
>+ goto cleanup;
>+ }
>+
>+ if (VIR_ALLOC_N(ldist, ndistances) < 0)
>+ goto cleanup;
>+
>+ if (!ldist[cur_cell].value)
>+ ldist[cur_cell].value = LOCAL_DISTANCE;
>+ ldist[cur_cell].cellid = cur_cell;
>+ def->mem_nodes[cur_cell].ndistances = ndistances;
>+ }
>+
>+ ldist[sibling_id].cellid = sibling_id;
>+ ldist[sibling_id].value = sibling_value;
I don't see the check for 10 <= sibling_value <= 255 anywhere.
>+ def->mem_nodes[cur_cell].distances = ldist;
>+
>+ rdist = def->mem_nodes[sibling_id].distances;
>+ if (!rdist) {
>+ if (def->mem_nodes[sibling_id].ndistances) {
>+ virReportError(VIR_ERR_XML_ERROR,
>+ _("Erratic 'ndistances' set in NUMA distances for sibling id: '%d'"),
>+ sibling_id);
>+ goto cleanup;
>+ }
>+
>+ if (VIR_ALLOC_N(rdist, ndistances) < 0)
>+ goto cleanup;
>+
>+ if (!rdist[sibling_id].value)
>+ rdist[sibling_id].value = LOCAL_DISTANCE;
>+ rdist[sibling_id].cellid = sibling_id;
>+ def->mem_nodes[sibling_id].ndistances = ndistances;
>+ }
>+
>+ rdist[cur_cell].cellid = cur_cell;
>+ rdist[cur_cell].value = sibling_value;
>+ def->mem_nodes[sibling_id].distances = rdist;
>+ }
>+
>+ ret = 0;
>+
>+ cleanup:
>+ if (ret) {
>+ for (i = 0; i < ndistances; i++)
>+ VIR_FREE(def->mem_nodes[i].distances);
>+ }
>+ VIR_FREE(nodes);
>+ VIR_FREE(tmp);
>+
>+ return ret;
>+}
>+
> int
> virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
> xmlXPathContextPtr ctxt)
>@@ -694,7 +831,7 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
> xmlNodePtr oldNode = ctxt->node;
> char *tmp = NULL;
> int n;
>- size_t i;
>+ size_t i, j;
> int ret = -1;
>
> /* check if NUMA definition is present */
>@@ -712,7 +849,6 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
> def->nmem_nodes = n;
>
> for (i = 0; i < n; i++) {
>- size_t j;
> int rc;
> unsigned int cur_cell = i;
>
>@@ -788,6 +924,10 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
> def->mem_nodes[cur_cell].memAccess = rc;
> VIR_FREE(tmp);
> }
>+
>+ /* Parse NUMA distances info */
>+ if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0)
>+ goto cleanup;
> }
>
> ret = 0;
>@@ -801,8 +941,8 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
>
>
> int
>-virDomainNumaDefCPUFormat(virBufferPtr buf,
>- virDomainNumaPtr def)
>+virDomainNumaDefCPUFormatXML(virBufferPtr buf,
>+ virDomainNumaPtr def)
Same here with the rename.
> {
> virDomainMemoryAccess memAccess;
> char *cpustr;
>@@ -815,6 +955,8 @@ virDomainNumaDefCPUFormat(virBufferPtr buf,
> virBufferAddLit(buf, "<numa>\n");
> virBufferAdjustIndent(buf, 2);
> for (i = 0; i < ncells; i++) {
>+ int ndistances;
>+
> memAccess = virDomainNumaGetNodeMemoryAccessMode(def, i);
>
> if (!(cpustr = virBitmapFormat(virDomainNumaGetNodeCpumask(def, i))))
>@@ -829,7 +971,32 @@ virDomainNumaDefCPUFormat(virBufferPtr buf,
> if (memAccess)
> virBufferAsprintf(buf, " memAccess='%s'",
> virDomainMemoryAccessTypeToString(memAccess));
>- virBufferAddLit(buf, "/>\n");
>+
>+ ndistances = def->mem_nodes[i].ndistances;
>+ if (!ndistances) {
>+ virBufferAddLit(buf, "/>\n");
>+ } else {
>+ size_t j;
>+ virDomainNumaDistancePtr distances = def->mem_nodes[i].distances;
>+
>+ virBufferAddLit(buf, ">\n");
>+ virBufferAdjustIndent(buf, 2);
>+ virBufferAddLit(buf, "<distances>\n");
>+ virBufferAdjustIndent(buf, 2);
>+ for (j = 0; j < ndistances; j++) {
>+ if (distances[j].value) {
>+ virBufferAddLit(buf, "<sibling");
>+ virBufferAsprintf(buf, " id='%d'", distances[j].cellid);
>+ virBufferAsprintf(buf, " value='%d'", distances[j].value);
>+ virBufferAddLit(buf, "/>\n");
>+ }
>+ }
>+ virBufferAdjustIndent(buf, -2);
>+ virBufferAddLit(buf, "</distances>\n");
>+ virBufferAdjustIndent(buf, -2);
>+ virBufferAddLit(buf, "</cell>\n");
>+ }
>+
> VIR_FREE(cpustr);
> }
> virBufferAdjustIndent(buf, -2);
>@@ -922,13 +1089,146 @@ virDomainNumaCheckABIStability(virDomainNumaPtr src,
> size_t
> virDomainNumaGetNodeCount(virDomainNumaPtr numa)
> {
>- if (!numa)
>+ if (!numa || !numa->mem_nodes)
Just "why?" ^^
> return 0;
>
> return numa->nmem_nodes;
> }
>
>
>+size_t
>+virDomainNumaSetNodeCount(virDomainNumaPtr numa, size_t nmem_nodes)
>+{
>+ if (!nmem_nodes) {
>+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>+ _("Cannot set an empty nmem_nodes set"));
>+ return 0;
>+ }
>+
>+ if (numa->mem_nodes) {
>+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>+ _("Cannot alter an existing nmem_nodes set"));
>+ return 0;
>+ }
>+
>+ if (VIR_ALLOC_N(numa->mem_nodes, nmem_nodes) < 0)
>+ return 0;
>+
>+ numa->nmem_nodes = nmem_nodes;
>+
>+ return numa->nmem_nodes;
>+}
>+
Are you introducing a function that is not used anywhere as a part of a
patch that does something else as well? I can't see the point of this
function really... It looks like you are guarding against user input
which doesn't really makes sense here for me. Maybe I need to see
where it is used but it's not in this patch...
>+size_t
>+virDomainNumaGetNodeDistance(virDomainNumaPtr numa,
>+ size_t node,
>+ size_t cellid)
>+{
>+ virDomainNumaDistancePtr distances = NULL;
>+
>+ if (node < numa->nmem_nodes)
>+ distances = numa->mem_nodes[node].distances;
>+
>+ /*
>+ * Present the configured distance value. If
>+ * out of range or not available set the platform
>+ * defined default for local and remote nodes.
>+ */
>+ if (!distances ||
>+ !distances[cellid].value ||
>+ !numa->mem_nodes[node].ndistances)
>+ return (node == cellid) ? \
>+ LOCAL_DISTANCE : REMOTE_DISTANCE;
>+
>+ return distances[cellid].value;
>+}
>+
>+
>+int
>+virDomainNumaSetNodeDistance(virDomainNumaPtr numa,
>+ size_t node,
>+ size_t cellid,
>+ unsigned int value)
>+{
>+ virDomainNumaDistancePtr distances;
>+
>+ if (node >= numa->nmem_nodes) {
>+ virReportError(VIR_ERR_INTERNAL_ERROR,
>+ _("Argument 'node' %zu outranges defined number "
>+ "of NUMA nodes"), node);
>+ return -1;
>+ }
>+
>+ distances = numa->mem_nodes[node].distances;
>+ if (!distances ||
>+ cellid >= numa->mem_nodes[node].ndistances) {
>+ virReportError(VIR_ERR_XML_ERROR, "%s",
>+ _("Arguments under memnode element do not "
>+ "correspond with existing guest's NUMA cell"));
>+ return -1;
>+ }
>+
>+ /*
>+ * Advanced Configuration and Power Interface
>+ * Specification version 6.1. Chapter 5.2.17
>+ * System Locality Distance Information Table
>+ * ... Distance values of 0-9 are reserved.
>+ */
>+ if (value < LOCAL_DISTANCE) {
>+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>+ _("Distance value of %d is in 0-9 reserved range"),
>+ value);
>+ return -1;
>+ }
>+
>+ if (value > LOCAL_DISTANCE && node == cellid) {
>+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
>+ _("Distance value %d under node %zu does "
>+ "not meet LOCAL_DISTANCE of 10"),
Maybe you meant REMOTE_DISTANCE here? If there were tests it would be
visible. Similarly if this function was used anywhere in this patch. I
don't feel like reading further, other may continue the review.
Have a nice day,
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170831/26d03a16/attachment-0001.sig>
More information about the libvir-list
mailing list