[libvirt] [RFC PATCH 0/2] nodeinfo: PPC64: Fix topology and siblings info on capabilities and nodeinfo

Tue May 17 09:49:22 UTC 2016

On Tue, 2016-05-10 at 17:59 -0400, Cole Robinson wrote:
> On 05/05/2016 02:48 PM, Andrea Bolognani wrote:
> > On Fri, 2016-01-29 at 01:32 -0500, Shivaprasad G Bhat wrote:
> > 
> > ** Guest threads limit **
> > 
> > My dual-core laptop will happily run a guest configured with
> > 
> >   <cpu>
> >     <topology sockets='1' cores='1' threads='128'/>
> >   </cpu>
> > 
> > but POWER guests are limited to 8/subcores_per_core threads.
> 
> How is it limited? Does something explicitly fail (libvirt, qemu, guest OS)?
> Or are the threads just not usable in the VM
> 
> Is it specific to PPC64 KVM, or PPC64 emulated as well?

QEMU fails with errors like

  qemu-kvm: Cannot support more than 8 threads on PPC with KVM
  qemu-kvm: Cannot support more than 1 threads on PPC with TCG

depending on the guest type.

> > We need to report this information to the user somehow, and
> > I can't see an existing place where it would fit nicely. We
> > definitely don't want to overload the meaning of an existing
> > element/attribute with this. It should also only appear in
> > the (dom)capabilities XML of ppc64 hosts.
> > 
> > I don't think this is too problematic or controversial, we
> > just need to pick a nice place to display this information.

Adding to the above: we already have

  <vcpu max='...'/>

in the domcapabilities XML, and there was some recent
discussion about improving the information reported there.

Possibly a good match?

> > ** Efficient guest topology **
> > 
> > To achieve optimal performance, you want to match guest
> > threads with host threads.
> > 
> > On x86, you can choose suitable host threads by looking at
> > the capabilities XML: the presence of elements like
> > 
> >   <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/>
> >   <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/>
> > 
> > means you should configure your guest to use
> > 
> >   <vcpu placement='static' cpuset='2-3'>2</vcpu>
> >   <cpu>
> >     <topology sockets='1' cores='1' threads='2'/>
> >   </cpu>
> > 
> > Notice how siblings can be found either looking at the
> > attribute with the same name, or by matching them using the
> > value of the core_id attribute. Also notice how you are
> > supposed to pin as many vCPUs as the number of elements in
> > the cpuset - one guest thread per host thread.
> 
> Ahh, I see that threads are implicitly reported by the fact that socket_id and
> core_id are identical across the different cpu ids... that took me a couple
> minutes :)

Yup :)

thread_siblings_list, the sysfs topology file we read to fill
in the 'siblings' attribute, actually contains the internal
information the kernel has gathered by matching socket_id (aka
physical_package_id in sysfs) and core_id[1].

> > On POWER, this gets much trickier: only the *primary* thread
> > of each (sub)core appears to be online in the host, but all
> > threads can actually have a vCPU running on them. So
> > 
> >   <cpu id='0' socket_id='0' core_id='32' siblings='0,4'/>
> >   <cpu id='4' socket_id='0' core_id='32' siblings='0,4'/>
> > 
> > which is what you'd get with subcores_per_core=2, is very
> > confusing.
>
> Okay, this bit took me _more_ than a couple minutes. Is this saying topology of
> 
> socket #0
>   core #32
>     subcore #1
>       cpu id='0' thread #1
>       cpu id='1' thread #2 (offline)
>       cpu id='2' thread #3 (offline)
>       cpu id='3' thread #4 (offline)
>     subcore #2
>       cpu id='4' thread #1
>       cpu id='5' thread #2 (offline)
>       cpu id='6' thread #3 (offline)
>       cpu id='7' thread #4 (offline)
> ...
> 
> what would the hypothetical physical_core_id value look like in that example?

physical_core_id would be 32 for all of the above - it would
just be the very value of core_id the kernel reads from the
hardware and reports through sysfs.

The tricky bit is that, when subcores are in use, core_id and
physical_core_id would not match. They will always match on
architectures that lack the concept of subcores, though.

> > The optimal guest topology in this case would be
> > 
> >   <vcpu placement='static' cpuset='4'>4</vcpu>
> >   <cpu>
> >     <topology sockets='1' cores='1' threads='4'/>
> >   </cpu>
> 
> So when we pin to logical CPU #4, ppc KVM is smart enough to see that it's a
> subcore thread, will then make use of the offline threads in the same subcore?
> Or does libvirt do anything fancy to facilitate this case?

My understanding is that libvirt shouldn't have to do anything
to pass the hint to kvm, but David will have the authoritative
answer here.

> > but neither approaches mentioned above work to figure out the
> > correct value for the cpuset attribute.
> > 
> > In this case, a possible solution would be to alter the values
> > of the core_id and siblings attribute such that both would be
> > the same as the id attribute, which would naturally make both
> > approaches described above work.
> > 
> > Additionaly, a new attribute would be introduced to serve as
> > a multiplier for the "one guest thread per host thread" rule
> > mentioned earlier: the resulting XML would look like
> > 
> >   <cpu id='0' socket_id='0' core_id='0' siblings='0' capacity='4'/>
> >   <cpu id='4' socket_id='0' core_id='4' siblings='4' capacity='4'/>
> > 
> > which contains all the information needed to build the right
> > guest topology. The capacity attribute would have value 1 on
> > all architectures except for ppc64.
> 
> capacity is pretty generic sounding... not sure if that's good or not in this
> case. maybe thread_capacity?

Yeah, I'm not in love with the name either, but I've been unable
to come up with a better one myself. thread_capacity might be a
tiny bit better, but in the end I think there's little chance
we'll be able to find a good, short name for "you can pin this
number of guest threads to this host thread" - let's pick
something not horrible and document the heck out of it.

> > We could arguably use the capacity attribute to cover the
> > use case described in the first part as well, by declaring that
> > any value other than 1 means there's a limit to the number of
> > threads a guest core can have. I think doing so has the
> > potential to produce much grief in the future, so I'd rather
> > keep them separate - even if it means inventing a new element.
> > 
> > It's been also proposed to add a physical_core_id attribute,
> > which would contain the real core id and allow tools to figure
> > out which subcores belong to the same core - it would be the
> > same as core_id for all other architectures and for ppc64
> > when subcores_per_core=1. It's not clear whether having this
> > attribute would be useful or just confusing.
> 
> IMO it seems like something worth adding since it is a pertinent piece of the
> topology, even if there isn't a clear programmatic use for it yet.

It is a piece of information that we would not be reporting,
that much is clear. However, as mentioned above, I'm afraid it
might make things more confusing, especially for architectures
that do not have subcores - basically all of them.

So maybe we should only add this information once its usefulness
has been proven.

> > This is all I have for now. Please let me know what you think
> > about it.
> 
> FWIW virt-manager basically doesn't consume the host topology XML, so there's
> no concern there.

That's good to know :)

> A quick grep seems to indicate that both nova (openstack) and vdsm
> (ovirt/rhev) _do_ consume this XML for their numa magic (git grep sibling),
> but I can't speak to the details of how it's consumed.

We won't know whether the proposal is actually sensible until
David weighs in, but I'm adding Martin back in the loop so
we can maybe give us the oVirt angle in the meantime.

Thanks for sharing your thoughts!

[1] https://www.kernel.org/doc/Documentation/cputopology.txt
-- 
Andrea Bolognani
Software Engineer - Virtualization Team