[libvirt] [PATCH v1] numa: compute and set matching vcpus for numa domains

Peter Krempa pkrempa at redhat.com
Thu Jul 20 13:34:12 UTC 2017


On Thu, Jul 20, 2017 at 14:17:29 +0100, Daniel Berrange wrote:
> On Thu, Jul 20, 2017 at 03:03:25PM +0200, Wim ten Have wrote:
> > On Thu, 20 Jul 2017 11:10:31 +0100
> > "Daniel P. Berrange" <berrange at redhat.com> wrote:
> > 
> > > On Thu, Jul 20, 2017 at 11:29:26AM +0200, Wim Ten Have wrote:
> > > > From: Wim ten Have <wim.ten.have at oracle.com>
> > > > 
> > > > The QEMU driver can erroneously allocate more vpus to a domain
> > > > than there are cpus in the domain if the <numa> element is used
> > > > to describe <cpu> element topology. Fix this by calculating
> > > > the number of cpus described in the <numa> element of a <cpu>
> > > > element and comparing it to the number of vcpus. If the number
> > > > of vcpus is greater than the number of cpus, cap the number of
> > > > vcpus to the number of cpus.
> > > > 
> > > > This patch also adds a small libvirt documentation update under
> > > > the "CPU Allocation" section.
> > > > 
> > > > Signed-off-by: Wim ten Have <wim.ten.have at oracle.com>
> > > > ---
> > > >  docs/formatdomain.html.in |  9 ++++++++-
> > > >  src/conf/domain_conf.c    | 14 +++++++++++---
> > > >  2 files changed, 19 insertions(+), 4 deletions(-)a

[...]

> >   Without the fix under former <numa> domain description libvirt would
> >   bring whole '16' vcpus to the guest, and if there was a current value
> >   given all by current on top of the <numa> <cell ...  to NUMA-node0
> >   for that matter :-( so wrong;
> > 
> >     <vcpu placement='static'>16</vcpu>
> > 	..
> >       <numa>
> >         <cell id='0' cpus='0' memory='262144' unit='KiB'/>
> >         <cell id='1' cpus='1' memory='262144' unit='KiB'/>
> >         <cell id='2' cpus='2' memory='262144' unit='KiB'/>
> >         <cell id='3' cpus='3' memory='262144' unit='KiB'/>
> >       </numa>
> 
> This is saying that we have 16 CPU sockets, but you're only
> assigning NUMA affinity to 4 of the sockets. IMHO that is
> a configuration mistake. All 16 CPU sockets need to have
> affinity assigned, so that when they're later brought online
> they can be placed suitably on the host.
> 
> >    
> >   With the fix, its post configuration action will now nicely rewrite
> >   the <vcpu ... current='#count'> element as shown below;
> > 
> >     <vcpu placement='static' current='4'>16</vcpu>
> > 	..
> >       <numa>
> >         <cell id='0' cpus='0' memory='262144' unit='KiB'/>
> >         <cell id='1' cpus='1' memory='262144' unit='KiB'/>
> >         <cell id='2' cpus='2' memory='262144' unit='KiB'/>
> >         <cell id='3' cpus='3' memory='262144' unit='KiB'/>
> >       </numa>
> 
> IMHO that is still just as broken as the previous config,
> because it isn't saying how to set affinity on the remaining
> 12 vCPUs that can be brought online.

Yes, exactly. The reason for leaving this in the 'gray zone' was that we
did not check this previously and qemu just picked one of the nodes and
dumped the rest of the vcpus there.

The correct way would be to enforce full mapping in NUMA.

Note that this needs to be done in the validation callbacks rather than
the post parse callbacks, since there's real risk of VMs vanishing.
Validation callbacks are triggered only at start of the VM so provide an
error.

Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170720/a2aad11f/attachment-0001.sig>


More information about the libvir-list mailing list