[libvirt] [PATCH v1] numa: compute and set matching vcpus for numa domains

Daniel P. Berrange berrange at redhat.com
Thu Jul 20 13:17:29 UTC 2017


On Thu, Jul 20, 2017 at 03:03:25PM +0200, Wim ten Have wrote:
> On Thu, 20 Jul 2017 11:10:31 +0100
> "Daniel P. Berrange" <berrange at redhat.com> wrote:
> 
> > On Thu, Jul 20, 2017 at 11:29:26AM +0200, Wim Ten Have wrote:
> > > From: Wim ten Have <wim.ten.have at oracle.com>
> > > 
> > > The QEMU driver can erroneously allocate more vpus to a domain
> > > than there are cpus in the domain if the <numa> element is used
> > > to describe <cpu> element topology. Fix this by calculating
> > > the number of cpus described in the <numa> element of a <cpu>
> > > element and comparing it to the number of vcpus. If the number
> > > of vcpus is greater than the number of cpus, cap the number of
> > > vcpus to the number of cpus.
> > > 
> > > This patch also adds a small libvirt documentation update under
> > > the "CPU Allocation" section.
> > > 
> > > Signed-off-by: Wim ten Have <wim.ten.have at oracle.com>
> > > ---
> > >  docs/formatdomain.html.in |  9 ++++++++-
> > >  src/conf/domain_conf.c    | 14 +++++++++++---
> > >  2 files changed, 19 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > > index 07208ee..3c85307 100644
> > > --- a/docs/formatdomain.html.in
> > > +++ b/docs/formatdomain.html.in
> > > @@ -501,7 +501,14 @@
> > >        <dt><code>vcpu</code></dt>
> > >        <dd>The content of this element defines the maximum number of virtual
> > >          CPUs allocated for the guest OS, which must be between 1 and
> > > -        the maximum supported by the hypervisor.
> > > +        the maximum supported by the hypervisor. If a <code>numa</code>
> > > +        element is contained within the <code>cpu</code> element, the
> > > +        number of virtual CPUs to be allocated is compared with the sum
> > > +        of the <code>cpus</code> attributes across the <code>cell</code>
> > > +        elements within the <code>numa</code> element. If the number of
> > > +        virtual CPUs is greater than the sum of the <code>cpus</code>
> > > +        attributes, the number of virtual CPUs is capped at sum of the
> > > +        <code>cpus</code> attributes.
> > >          <dl>
> > >           <dt><code>cpuset</code></dt>
> > >           <dd>
> > > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> > > index 958a5b7..73d7f4f 100644
> > > --- a/src/conf/domain_conf.c
> > > +++ b/src/conf/domain_conf.c
> > > @@ -3844,12 +3844,20 @@ virDomainDefPostParseMemory(virDomainDefPtr def,
> > >      unsigned long long numaMemory = 0;
> > >      unsigned long long hotplugMemory = 0;
> > >  
> > > -    /* Attempt to infer the initial memory size from the sum NUMA memory sizes
> > > -     * in case ABI updates are allowed or the <memory> element wasn't specified */
> > > +    /* Attempt to infer the initial memory size from the sum NUMA memory
> > > +     * sizes in case ABI updates are allowed or the <memory> element
> > > +     * wasn't specified.  Also cap the vcpu count to the sum of NUMA cpus
> > > +     * allocated for all nodes. */
> > >      if (def->mem.total_memory == 0 ||
> > >          parseFlags & VIR_DOMAIN_DEF_PARSE_ABI_UPDATE ||
> > > -        parseFlags & VIR_DOMAIN_DEF_PARSE_ABI_UPDATE_MIGRATION)
> > > +        parseFlags & VIR_DOMAIN_DEF_PARSE_ABI_UPDATE_MIGRATION) {
> > > +        unsigned int numaVcpus = 0;
> > > +
> > > +        if ((numaVcpus = virDomainNumaGetCPUCountTotal(def->numa)))
> > > +            virDomainDefSetVcpus(def, numaVcpus);
> > > +
> > >          numaMemory = virDomainNumaGetMemorySize(def->numa);
> > > +    }  
> > 
> > AFAICT, this scenario is simply a user configuration mistake, and so we
> > should report an error message to them to fix it.
> 
>   Not to push but I think this is the correct way ... O:-).
> 
>   Ok, perhaps the documentation note/commit message should be slightly
>   rewritten aswell as the altered comment on specific code section.
> 
>   The change is _NOT_ changing final guest provided 'maxvcpus' but
>   'vcpus' instead.  This means that it adds the "current='#count'" under
>   the vcpu element if such is less then vcpu 'maxvcpus' provided count.
>   If equal there is no issue and if larger there is indeed a correct
>   error message (exceptional condition).
> 
>   Imagine simpel domain description _WITHOUT_ <numa> and below <vcpu>
>   element description;
> 
>     <vcpu placement='static' current='4'>16</vcpu>
> 
>   This nicely creates a guest with 4 domain CPUs added, where the
>   administrator can "hot-add" an additional 12 CPUs making the full
>   'maxvcpus' defined 16.  "hot-add" by virsh;
> 
>     virsh # setvcpus kvm26 16
> 
>   Without the fix under former <numa> domain description libvirt would
>   bring whole '16' vcpus to the guest, and if there was a current value
>   given all by current on top of the <numa> <cell ...  to NUMA-node0
>   for that matter :-( so wrong;
> 
>     <vcpu placement='static'>16</vcpu>
> 	..
>       <numa>
>         <cell id='0' cpus='0' memory='262144' unit='KiB'/>
>         <cell id='1' cpus='1' memory='262144' unit='KiB'/>
>         <cell id='2' cpus='2' memory='262144' unit='KiB'/>
>         <cell id='3' cpus='3' memory='262144' unit='KiB'/>
>       </numa>

This is saying that we have 16 CPU sockets, but you're only
assigning NUMA affinity to 4 of the sockets. IMHO that is
a configuration mistake. All 16 CPU sockets need to have
affinity assigned, so that when they're later brought online
they can be placed suitably on the host.

>    
>   With the fix, its post configuration action will now nicely rewrite
>   the <vcpu ... current='#count'> element as shown below;
> 
>     <vcpu placement='static' current='4'>16</vcpu>
> 	..
>       <numa>
>         <cell id='0' cpus='0' memory='262144' unit='KiB'/>
>         <cell id='1' cpus='1' memory='262144' unit='KiB'/>
>         <cell id='2' cpus='2' memory='262144' unit='KiB'/>
>         <cell id='3' cpus='3' memory='262144' unit='KiB'/>
>       </numa>

IMHO that is still just as broken as the previous config,
because it isn't saying how to set affinity on the remaining
12 vCPUs that can be brought online.

>    
>   In case "current='#count'" description is given where it does not
>   match the <numa> <cell ... #cpus> count then such is corrected to
>   cap the sum of all <numa> <cell ... #cpus>.
>   Perhaps that occurance should be rejected with an error message, then
>   such is not exceeding the domain administrators written <vcpu> 'maxvcpus'
>   element being '16' within my example above.  That is a hard limit and
>   of course rejected with an error.
> 
>   I will not argue further ... O:-) I'll wait and if it is a _NO_ then
>   it is a _NO_!

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list