[libvirt] [RFC PATCH 0/2] nodeinfo: PPC64: Fix topology and siblings info on capabilities and nodeinfo

Fri Jun 10 15:52:47 UTC 2016

On Tue, 2016-05-31 at 16:08 +1000, David Gibson wrote:
> > QEMU fails with errors like
> > 
> >   qemu-kvm: Cannot support more than 8 threads on PPC with KVM
> >   qemu-kvm: Cannot support more than 1 threads on PPC with TCG
> > 
> > depending on the guest type.
> 
> Note that in a sense the two errors come about for different reasons.
> 
> On Power, to a much greater degree than x86, threads on the same core
> have observably different behaviour from threads on different cores.
> Because of that, there's no reasonable way for KVM to present more
> guest threads-per-core than there are host threads-per-core.
> 
> The limit of 1 thread on TCG is simply because no-one's ever bothered
> to implement SMT emulation in qemu.

That just means in the future we might have to expose something
other than an hardcoded '1' as guest thread limit for TCG guests;
the interface would remain valid AFAICT.

> > physical_core_id would be 32 for all of the above - it would
> > just be the very value of core_id the kernel reads from the
> > hardware and reports through sysfs.
> > 
> > The tricky bit is that, when subcores are in use, core_id and
> > physical_core_id would not match. They will always match on
> > architectures that lack the concept of subcores, though.
> 
> Yeah, I'm still not terribly convinced that we should even be
> presenting physical core info instead of *just* logical core info.  If
> you care that much about physical core topology, you probably
> shouldn't be running your system in subcore mode.

Me neither. We could leave it out initially, and add it later
if it turns out to be useful, I guess.

> > > > The optimal guest topology in this case would be
> > > >  
> > > >    <vcpu placement='static' cpuset='4'>4</vcpu>
> > > >    <cpu>
> > > >      <topology sockets='1' cores='1' threads='4'/>
> > > >    </cpu>  
> > >  
> > > So when we pin to logical CPU #4, ppc KVM is smart enough to see that it's a
> > > subcore thread, will then make use of the offline threads in the same subcore?
> > > Or does libvirt do anything fancy to facilitate this case?  
> > 
> > My understanding is that libvirt shouldn't have to do anything
> > to pass the hint to kvm, but David will have the authoritative
> > answer here.
> 
> Um.. I'm not totally certain.  It will be one of two things:
>    a) you just bind the guest thread to the representative host thread
>    b) you bind the guest thread to a cpumask with all of the host
>       threads on the relevant (sub)core - including the offline host
>       threads
> 
> I'll try to figure out which one it is.

I played with this a bit: I created a guest with

  <vcpu placement='static' cpuset='0,8'>8</vcpu>
  <cpu>
    <topology sockets='1' cores='2' threads='4'/>
  </cpu>

and then, inside the guest, I used cgroups to pin a bunch
of busy loops to specific vCPUs.

As long as all the load (8+ busy loops) was distributed
only across vCPUs 0-3, one of the host threads remained idle.
As soon as the first of the jobs was moved to vCPUs 4-7, the
other host thread immediately jumped to 100%.

This seems to indicate that QEMU / KVM are actually smart
enough to schedule guest threads on the corresponding host
threads. I think :)

On the other hand, when I changed the guest to distribute the
8 vCPUs among 2 sockets with 4 cores each instead, the second
host thread would start running as soon as I started the
second busy loop.

> > We won't know whether the proposal is actually sensible until
> > David weighs in, but I'm adding Martin back in the loop so
> > we can maybe give us the oVirt angle in the meantime.
> 
> TBH, I'm not really sure what you want from me.  Most of the questions
> seem to be libvirt design decisions which are independent of the layers
> below.

I mostly need you to sanity check my proposals and point out
any incorrect / dubious claims, just like you did above :)

The design of features like this one can have pretty
significant consequences for the interactions between the
various layers, and when the choices are not straightforward
I think it's better to gather as much feedback as possible
from across the stack before moving forward with an
implementation.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team