[libvirt] [RFC PATCH 0/2] nodeinfo: PPC64: Fix topology and siblings info on capabilities and nodeinfo
Andrea Bolognani
abologna at redhat.com
Fri Jun 10 15:52:47 UTC 2016
On Tue, 2016-05-31 at 16:08 +1000, David Gibson wrote:
> > QEMU fails with errors like
> >
> > qemu-kvm: Cannot support more than 8 threads on PPC with KVM
> > qemu-kvm: Cannot support more than 1 threads on PPC with TCG
> >
> > depending on the guest type.
>
> Note that in a sense the two errors come about for different reasons.
>
> On Power, to a much greater degree than x86, threads on the same core
> have observably different behaviour from threads on different cores.
> Because of that, there's no reasonable way for KVM to present more
> guest threads-per-core than there are host threads-per-core.
>
> The limit of 1 thread on TCG is simply because no-one's ever bothered
> to implement SMT emulation in qemu.
That just means in the future we might have to expose something
other than an hardcoded '1' as guest thread limit for TCG guests;
the interface would remain valid AFAICT.
> > physical_core_id would be 32 for all of the above - it would
> > just be the very value of core_id the kernel reads from the
> > hardware and reports through sysfs.
> >
> > The tricky bit is that, when subcores are in use, core_id and
> > physical_core_id would not match. They will always match on
> > architectures that lack the concept of subcores, though.
>
> Yeah, I'm still not terribly convinced that we should even be
> presenting physical core info instead of *just* logical core info. If
> you care that much about physical core topology, you probably
> shouldn't be running your system in subcore mode.
Me neither. We could leave it out initially, and add it later
if it turns out to be useful, I guess.
> > > > The optimal guest topology in this case would be
> > > >
> > > > <vcpu placement='static' cpuset='4'>4</vcpu>
> > > > <cpu>
> > > > <topology sockets='1' cores='1' threads='4'/>
> > > > </cpu>
> > >
> > > So when we pin to logical CPU #4, ppc KVM is smart enough to see that it's a
> > > subcore thread, will then make use of the offline threads in the same subcore?
> > > Or does libvirt do anything fancy to facilitate this case?
> >
> > My understanding is that libvirt shouldn't have to do anything
> > to pass the hint to kvm, but David will have the authoritative
> > answer here.
>
> Um.. I'm not totally certain. It will be one of two things:
> a) you just bind the guest thread to the representative host thread
> b) you bind the guest thread to a cpumask with all of the host
> threads on the relevant (sub)core - including the offline host
> threads
>
> I'll try to figure out which one it is.
I played with this a bit: I created a guest with
<vcpu placement='static' cpuset='0,8'>8</vcpu>
<cpu>
<topology sockets='1' cores='2' threads='4'/>
</cpu>
and then, inside the guest, I used cgroups to pin a bunch
of busy loops to specific vCPUs.
As long as all the load (8+ busy loops) was distributed
only across vCPUs 0-3, one of the host threads remained idle.
As soon as the first of the jobs was moved to vCPUs 4-7, the
other host thread immediately jumped to 100%.
This seems to indicate that QEMU / KVM are actually smart
enough to schedule guest threads on the corresponding host
threads. I think :)
On the other hand, when I changed the guest to distribute the
8 vCPUs among 2 sockets with 4 cores each instead, the second
host thread would start running as soon as I started the
second busy loop.
> > We won't know whether the proposal is actually sensible until
> > David weighs in, but I'm adding Martin back in the loop so
> > we can maybe give us the oVirt angle in the meantime.
>
> TBH, I'm not really sure what you want from me. Most of the questions
> seem to be libvirt design decisions which are independent of the layers
> below.
I mostly need you to sanity check my proposals and point out
any incorrect / dubious claims, just like you did above :)
The design of features like this one can have pretty
significant consequences for the interactions between the
various layers, and when the choices are not straightforward
I think it's better to gather as much feedback as possible
from across the stack before moving forward with an
implementation.
--
Andrea Bolognani
Software Engineer - Virtualization Team
More information about the libvir-list
mailing list