[libvirt] [RFC PATCH 0/2] nodeinfo: PPC64: Fix topology and siblings info on capabilities and nodeinfo

Daniel P. Berrange berrange at redhat.com
Tue Jul 19 14:35:33 UTC 2016


On Thu, May 05, 2016 at 08:48:05PM +0200, Andrea Bolognani wrote:
> On Fri, 2016-01-29 at 01:32 -0500, Shivaprasad G Bhat wrote:
> > The nodeinfo output was fixed earlier to reflect the actual cpus available in
> > KVM mode on PPC64. The earlier fixes covered the aspect of not making a host
> > look overcommitted when its not. The current fixes are aimed at helping the
> > users make better decisions on the kind of guest cpu topology that can be
> > supported on the given sucore_per_core setting of KVM host and also hint the
> > way to pin the guest vcpus efficiently.
>> > I am planning to add some test cases once the approach is accepted.
>> > With respect to Patch 2:
> > The second patch adds a new element to the cpus tag and I need your inputs on
> > if that is okay. Also if there is a better way. I am not sure if the existing
> > clients have RNG checks that might fail with the approach. Or if the checks
> > are not enoforced on the elements but only on the tags.
>> > With my approach if the rng checks pass, the new element "capacity" even if
> > ignored by many clients would have no impact except for PPC64.
>> > To the extent I looked at code, the siblings changes dont affect existing
> > libvirt functionality. Please do let me know otherwise.
> 
> So, I've been going through this old thread trying to figure out
> a way to improve the status quo. I'd like to collect as much
> feedback as possible, especially from people who have worked in
> this area of libvirt before or have written tools based on it.
> 
> As hinted above, this series is really trying to address two
> different issue, and I think it's helpful to reason about them
> separately.
> 
> 
> ** Guest threads limit **
> 
> My dual-core laptop will happily run a guest configured with
> 
>   <cpu>
>     <topology sockets='1' cores='1' threads='128'/>
>   </cpu>
> 
> but POWER guests are limited to 8/subcores_per_core threads.
> 
> We need to report this information to the user somehow, and
> I can't see an existing place where it would fit nicely. We
> definitely don't want to overload the meaning of an existing
> element/attribute with this. It should also only appear in
> the (dom)capabilities XML of ppc64 hosts.
> 
> I don't think this is too problematic or controversial, we
> just need to pick a nice place to display this information.
> 
> 
> ** Efficient guest topology **
> 
> To achieve optimal performance, you want to match guest
> threads with host threads.
> 
> On x86, you can choose suitable host threads by looking at
> the capabilities XML: the presence of elements like
> 
>   <cpu id='2' socket_id='0' core_id='1' siblings='2-3'/>
>   <cpu id='3' socket_id='0' core_id='1' siblings='2-3'/>
> 
> means you should configure your guest to use
> 
>   <vcpu placement='static' cpuset='2-3'>2</vcpu>
>   <cpu>
>     <topology sockets='1' cores='1' threads='2'/>
>   </cpu>
> 
> Notice how siblings can be found either looking at the
> attribute with the same name, or by matching them using the
> value of the core_id attribute. Also notice how you are
> supposed to pin as many vCPUs as the number of elements in
> the cpuset - one guest thread per host thread.
> 
> On POWER, this gets much trickier: only the *primary* thread
> of each (sub)core appears to be online in the host, but all
> threads can actually have a vCPU running on them. So
> 
>   <cpu id='0' socket_id='0' core_id='32' siblings='0,4'/>
>   <cpu id='4' socket_id='0' core_id='32' siblings='0,4'/>
> 
> which is what you'd get with subcores_per_core=2, is very
> confusing.
> 
> The optimal guest topology in this case would be
> 
>   <vcpu placement='static' cpuset='4'>4</vcpu>
>   <cpu>
>     <topology sockets='1' cores='1' threads='4'/>
>   </cpu>
> 
> but neither approaches mentioned above work to figure out the
> correct value for the cpuset attribute.
> 
> In this case, a possible solution would be to alter the values
> of the core_id and siblings attribute such that both would be
> the same as the id attribute, which would naturally make both
> approaches described above work.
> 
> Additionaly, a new attribute would be introduced to serve as
> a multiplier for the "one guest thread per host thread" rule
> mentioned earlier: the resulting XML would look like
> 
>   <cpu id='0' socket_id='0' core_id='0' siblings='0' capacity='4'/>
>   <cpu id='4' socket_id='0' core_id='4' siblings='4' capacity='4'/>
> 
> which contains all the information needed to build the right
> guest topology. The capacity attribute would have value 1 on
> all architectures except for ppc64.

I don't really like the fact that with this design, we effectively
have a bunch of <cpu> which are invisible whose existance is just
implied by the 'capacity=4' attribute.

I also don't like tailoring output of capabilities XML for one
specific use case.

IOW, I think we should explicitly represent all the CPUs in the
node capabilities, even if they are offline in the host. We could
introduce a new attribute to indicate the status of CPUs. So
instead of

  <cpu id='0' socket_id='0' core_id='0' siblings='0' capacity='4'/>
  <cpu id='4' socket_id='0' core_id='4' siblings='4' capacity='4'/>

I'd like to see

  <cpu id='0' socket_id='0' core_id='0' siblings='0-3' state="online"/>
  <cpu id='0' socket_id='0' core_id='0' siblings='0-3' state="offline"/>
  <cpu id='0' socket_id='0' core_id='0' siblings='0-3' state="offline"/>
  <cpu id='0' socket_id='0' core_id='0' siblings='0-3' state="offline"/>
  <cpu id='4' socket_id='0' core_id='4' siblings='4-7' state="online"/>
  <cpu id='4' socket_id='0' core_id='4' siblings='4-7' state="offline"/>
  <cpu id='4' socket_id='0' core_id='4' siblings='4-7' state="offline"/>
  <cpu id='4' socket_id='0' core_id='4' siblings='4-7' state="offline"/>


The domain capabilities meanwhile is where you'd express any usage
constraint for cores/threads requried by QEMU.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list