[PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=

Fri May 22 17:18:31 UTC 2020

On Fri, 22 May 2020 18:28:31 +0200
Michal Privoznik <mprivozn at redhat.com> wrote:

> On 5/22/20 6:07 PM, Igor Mammedov wrote:
> > On Fri, 22 May 2020 16:14:14 +0200
> > Michal Privoznik <mprivozn at redhat.com> wrote:
> >   
> >> QEMU is trying to obsolete -numa node,cpus= because that uses
> >> ambiguous vCPU id to [socket, die, core, thread] mapping. The new
> >> form is:
> >>
> >>    -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T
> >>
> >> which is repeated for every vCPU and places it at [S, D, C, T]
> >> into guest NUMA node N.
> >>
> >> While in general this is magic mapping, we can deal with it.
> >> Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology
> >> is given then maxvcpus must be sockets * dies * cores * threads
> >> (i.e. there are no 'holes').
> >> Secondly, if no topology is given then libvirt itself places each
> >> vCPU into a different socket (basically, it fakes topology of:
> >> [maxvcpus, 1, 1, 1])
> >> Thirdly, we can copy whatever QEMU is doing when mapping vCPUs
> >> onto topology, to make sure vCPUs don't start to move around.
> >>
> >> Note, migration from old to new cmd line works and therefore
> >> doesn't need any special handling.
> >>
> >> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085
> >>
> >> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
> >> ---
> >>   src/qemu/qemu_command.c                       | 108 +++++++++++++++++-
> >>   .../hugepages-nvdimm.x86_64-latest.args       |   4 +-
> >>   ...memory-default-hugepage.x86_64-latest.args |  10 +-
> >>   .../memfd-memory-numa.x86_64-latest.args      |  10 +-
> >>   ...y-hotplug-nvdimm-access.x86_64-latest.args |   4 +-
> >>   ...ry-hotplug-nvdimm-align.x86_64-latest.args |   4 +-
> >>   ...ry-hotplug-nvdimm-label.x86_64-latest.args |   4 +-
> >>   ...ory-hotplug-nvdimm-pmem.x86_64-latest.args |   4 +-
> >>   ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args |   4 +-
> >>   ...hotplug-nvdimm-readonly.x86_64-latest.args |   4 +-
> >>   .../memory-hotplug-nvdimm.x86_64-latest.args  |   4 +-
> >>   ...vhost-user-fs-fd-memory.x86_64-latest.args |   4 +-
> >>   ...vhost-user-fs-hugepages.x86_64-latest.args |   4 +-
> >>   ...host-user-gpu-secondary.x86_64-latest.args |   3 +-
> >>   .../vhost-user-vga.x86_64-latest.args         |   3 +-
> >>   15 files changed, 158 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> >> index 7d84fd8b5e..0de4fe4905 100644
> >> --- a/src/qemu/qemu_command.c
> >> +++ b/src/qemu/qemu_command.c
> >> @@ -7079,6 +7079,91 @@ qemuBuildNumaOldCPUs(virBufferPtr buf,
> >>   }
> >>   
> >>   
> >> +/**
> >> + * qemuTranlsatevCPUID:
> >> + *
> >> + * For given vCPU @id and vCPU topology (@cpu) compute corresponding
> >> + * @socket, @die, @core and @thread). This assumes linear topology,
> >> + * that is every [socket, die, core, thread] combination is valid vCPU
> >> + * ID and there are no 'holes'. This is ensured by
> >> + * qemuValidateDomainDef() if QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS is
> >> + * set.  
> > I wouldn't make this assumption, each machine can have (and has) it's own layout,
> > and now it's not hard to change that per machine version if necessary.
> > 
> > I'd suppose one could pull the list of possible CPUs from QEMU started
> > in preconfig mode with desired -smp x,y,z using QUERY_HOTPLUGGABLE_CPUS
> > and then continue to configure numa with QMP commands using provided
> > CPUs layout.  
> 
> Continue where? At the 'preconfig mode' the guest is already started, 
> isn't it? Are you suggesting that libvirt starts a dummy QEMU process, 
> fetches the CPU topology from it an then starts if for real? Libvirt 
QEMU started but it's very far from starting guest, at that time it's possible
configure numa mapping at runtime and continue to -S or running state
without restarting QEMU. For the follow up starts, used topology and numa options
can be cached and reused at CLI time as long as machine/-smp combination stays
the same.

> tries to avoid that as much as it can.
> 
> > 
> > How to present it to libvirt user I'm not sure (give them that list perhaps
> > and let select from it???)  
> 
> This is what I am trying to figure out in the cover letter. Maybe we 
> need to let users configure the topology (well, vCPU id to [socket, die, 
> core, thread] mapping), but then again, in my testing the guest ignored 
> that and displayed different topology (true, I was testing with -cpu 
> host, so maybe that's why).
there is ongiong issue with EPYC VCPUs topology, but I otherwise it should work.
Just report bug to qemu-devel, if it's broken.

> 
> > But it's irrelevant, to the patch, magical IDs for socket/core/...whatever
> > should not be generated by libvirt anymore, but rather taken from QEMU for given
> > machine + -smp combination.  
> 
> Taken when? We can do this for running machines, but not for freshly 
> started ones, can we?

it can be used for freshly started as well,
QEMU -S -preconfig -M pc -smp ...
(QMP) query-hotpluggable-cpus
(QMP) set-numa-node ...
...
(QMP) exit-preconfig
(QMP) other stuff libvirt does (like hot-plugging CPUs , ...)
(QMP) cont

PS:
I didn't notice that -preconfig was moved to experimental state since
it took 2 years for starting to implement support for '-numa cpu'
to configure numa mappings it was designed for.
So it's not advertised in QAPI schema anymore, see QEMU commit
1f214ee1b83afd
CCing Markus, so we can think about where to go from here.