[libvirt] [PATCH 1/3] XML definitions for guest NUMA

Fri Nov 11 12:17:59 UTC 2011

On Mon, Nov 07, 2011 at 11:35:27AM -0700, Eric Blake wrote:
> On 11/06/2011 06:57 AM, Bharata B Rao wrote:
> >XML definitions for guest NUMA and parsing routines.
> >
> >From: Bharata B Rao<bharata at linux.vnet.ibm.com>
> >
> >This patch adds XML definitions for guest NUMA specification and contains
> >routines to parse the same. The guest NUMA specification looks like this:
> >
> ><cpu>
> >         ...
> >         <topology sockets='2' cores='4' threads='2'/>
> >         <numa>
> >                 <cell cpus='0-7' mems='512000'/>
> >                 <cell cpus='8-15' mems='512000'/>
> >         </numa>
> >         ...
> ></cpu>
> >
> >Signed-off-by: Bharata B Rao<bharata at linux.vnet.ibm.com>
> >---
> 
> >+<p>
> >+      Guest NUMA topology can be specifed using<code>numa</code>  element.
> >+<span class="since">Since X.X.X</span>
> 
> Let's just put 0.9.8 here.  It's easier at feature freeze time to
> grep and replace a concrete 0.9.8 into a different numbering scheme
> (0.10.0, 1.0.0, ?) if we decide on something different than 0.9.8,
> than it is to remember to also search for X.X.X.

Ok, changed to 0.9.8 in v3.

> 
> >+</p>
> >+
> >+<pre>
> >+  ...
> >+<cpu>
> >+    ...
> >+<numa>
> >+<cell cpus='0-3' mems='512000'/>
> >+<cell cpus='4-7' mems='512000'/>
> 
> I understand 'cpus' (a valid word meaning multiple cpu units), but
> 'mems' seems odd; I think it would be better naming this attribute
> 'memory' to match our <memory> element at the top level.  Just
> because qemu's command line names the option mems= doesn't mean we
> should be stuck making our XML inconsistent.

Changed to memory.

> 
> >+</numa>
> >+    ...
> >+</cpu>
> >+  ...</pre>
> >+
> >+<p>
> >+      Each<code>cell</code>  element specifies a NUMA cell or a NUMA node.
> >+<code>cpus</code>  specifies the CPU or range of CPUs that are part of
> >+      the node.<code>mems</code>  specifies the node memory in kilobytes
> >+      (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid
> >+      or nodeid in the increasing order starting from 0.
> 
> I agree with doing things in 1024-byte blocks [1], since <memory>
> and <currentMemory> are also in that unit.
> 
> [1] 1024-byte blocks is technically kibibytes, not kilobytes; but
> you're copying from existing text, so at least we're consistent, not
> to mention fitting right in with the wikipedia complaint that KiB
> has had slow adoption by the computer industry:
> https://secure.wikimedia.org/wikipedia/en/wiki/Kibibyte :)
> 
> >+</p>
> >+
> >+<p>
> >+      This guest NUMA specification translates to<code>-numa</code>  command
> >+      line option for QEMU/KVM. For the above example, the following QEMU
> >+      command line option is generated:
> >+<code>-numa node,nodeid=0,cpus=0-3,mems=512000 -numa node,nodeid=1,cpus=4-7,mems=512000</code>
> 
> This paragraph is not necessary.  We don't need to give one
> hypervisor-specific example of how the XML is translated; it is
> sufficient to simply document the XML semantics in a way that can be
> implemented by any number of hypervisors.

Ok, removed this paragraph.

> 
> >+
> >+<define name="numaCell">
> >+<element name="cell">
> >+<attribute name="cpus">
> >+<ref name="Cellcpus"/>
> 
> Typically, ref names start with a lower case letter.

Ok, changed.

> 
> >+</attribute>
> >+<attribute name="mems">
> >+<ref name="Cellmems"/>
> >+</attribute>
> 
> Is it possible for these attributes to be optional?  That is, on the
> qemu line, can I specify cpus= but not mems=, or mems= but not
> cpus=? If so, then the attributes need to be optional in the schema,
> and the code behave with sane defaults when one of the two
> attributes is not present.

Actually qemu allows both cpus and mem to be optional. But if we allow
that, we will have specifications like this:

<numa>
  <cell />
  <cell />
</numa>

That looks ugly.

Either both of them should allowed to be optional or none of them, I am
going for the latter specification. IOW lets make both of them
mandatory in a numa cell.

> 
> >@@ -2745,4 +2767,14 @@
> >        <param name="pattern">[a-zA-Z0-9_\.:]+</param>
> >      </data>
> >    </define>
> >+<define name="Cellcpus">
> >+<data type="string">
> >+<param name="pattern">([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*</param>
> 
> This looks like a repeat of <define name="cpuset">; if so, let's
> reuse that define instead of making a new one (and if not, why are
> we introducing yet another syntax?).

Fixed.

> 
> >+</data>
> >+</define>
> >+<define name="Cellmems">
> >+<data type="unsignedInt">
> >+<param name="pattern">[0-9]+</param>
> 
> Likewise, this looks like a repeat of <define name="memoryKB">, so
> lets reuse that.

Done.

> 
> >@@ -109,6 +114,19 @@ no_memory:
> >      return NULL;
> >  }
> >
> >+static int
> >+virCPUDefNumaCPUs(virCPUDefPtr def)
> >+{
> >+    int i, j, ncpus = 0;
> >+
> >+    for (i = 0; i<  def->ncells; i++) {
> >+        for (j = 0; j<  VIR_DOMAIN_CPUMASK_LEN; j++) {
> >+            if (def->cells[i].cpumask[j])
> >+                ncpus++;
> >+        }
> >+    }
> 
> Can this loop be made any faster by using count_one_bits?

Yes provided we start storing CPUs in bitmasks rather than
char arrays. But I saw other parts of the code (like cpuset)
storing CPUs in array and hence used the same for numa too.
In any case I got rid of this since we can arrive at this information
from other means.

> 
> >+    return ncpus;
> >+}
> >
> >  virCPUDefPtr
> >  virCPUDefParseXML(const xmlNodePtr node,
> >@@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node,
> >          def->features[i].policy = policy;
> >      }
> >
> >+    if (virXPathNode("./numa[1]", ctxt)) {
> >+        VIR_FREE(nodes);
> >+        n = virXPathNodeSet("./numa[1]/cell", ctxt,&nodes);
> >+        if (n<  0 || n == 0) {
> 
> This looks a bit funny, compared to if (n <= 0).

Yes. I first accounted for the error case and later added a check for
zero cells in <numa>. Fixed this.

> 
> >+        for (i = 0 ; i<  n ; i++) {
> >+            char *cpus;
> >+            int cpumasklen = VIR_DOMAIN_CPUMASK_LEN;
> >+            unsigned long ul;
> >+            int ret;
> >+
> >+            def->cells[i].cellid = i;
> >+            cpus = virXMLPropString(nodes[i], "cpus");
> >+
> >+            if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen)<  0)
> >+                goto no_memory;
> >+
> >+            if (virDomainCpuSetParse((const char **)&cpus,
> >+                                 0, def->cells[i].cpumask,
> >+                                 cpumasklen)<  0)
> 
> Does this behave properly if the cpus=... attribute was missing?

This was buggy. Fixed now.

> 
> >+                goto error;
> >+
> >+            ret = virXPathULong("string(./numa[1]/cell/@mems)",
> >+                            ctxt,&ul);
> >+            if (ret<  0) {
> >+                virCPUReportError(VIR_ERR_INTERNAL_ERROR,
> >+                    "%s", _("Missing 'mems' attribute in NUMA topology"));
> >+                goto error;
> 
> Especially since you checked for a missing mems= counterpart?  And
> back to my earlier question of whether both attributes are really
> mandatory, or whether just one in isolation can make sense.

As explained above, I made both of them mandatory now.

> 
> >+++ b/src/conf/cpu_conf.h
> >@@ -67,6 +67,14 @@ struct _virCPUFeatureDef {
> >      int policy;         /* enum virCPUFeaturePolicy */
> >  };
> >
> >+typedef struct _virCellDef virCellDef;
> >+typedef virCellDef *virCellDefPtr;
> >+struct _virCellDef {
> >+   int cellid;
> >+   char *cpumask;	/* CPUs that are part of this node */
> >+   unsigned int mem;	/* Node memory */
> 
> I'd write this as /* Node memory in kB */, to make it clear what
> scale is in use.

Done.

> 
> I saw the code for parsing the XML, but what about the code for
> generating the xml during 'virsh dumpxml' (aka virDomainDefFormat)?
> The patch is incomplete without that.

Thanks for pointing this out. Added code for this now.

Regards,
Bharata.