[libvirt] [RFC] Data in the <topology> element in the capabilities XML

Wed Jan 16 18:31:02 UTC 2013

On 01/16/13 19:11, Daniel P. Berrange wrote:
> On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
>> Hi everybody,
>>
>> a while ago there was a discussion about changing the data that is
>> returned in the <topology> sub-element:
>>
>> <capabilities>
>>    <host>
>>      <cpu>
>>        <arch>x86_64</arch>
>>        <model>SandyBridge</model>
>>        <vendor>Intel</vendor>
>>        <topology sockets='1' cores='2' threads='2'/>
>>
>>
>> The data provided here is as of today taken from the nodeinfo
>> detection code and thus is really wrong when the fallback mechanisms
>> are used.
>>
>> To get a useful count, the user has to multiply the data by the
>> number of NUMA nodes in the host. With the fallback detection code
>> used for nodeinfo the NUMA node count used to get the CPU count
>> should be 1 instead of the actual number.
>>
>> As Jiri proposed, I think we should change this output to separate
>> detection code that will not take into account NUMA nodes for this
>> output and will rather provide data as the "lspci" command does.
>>
>> This change will make the data provided by the element standalone
>> and also usable in guest XMLs to mirror host's topology.
>
> Well there are 2 parts which need to be considered here. What do we report
> in the host capabilities, and how do you configure guest XML.
>
>  From a historical compatibility pov I don't think we should be changing
> the host capabilities at all. Simply document that 'sockets' is treated
> as sockets-per-node everywhere, and that it is wrong in the case of
> machines where an socket can internally have multiple NUMA nodes.

I'm too somewhat concerned about changing this output due to historic 
reasons.
>
> Apps should be using the separate NUMA <topology> data in the capabilities
> instead of the CPU <topology> data, to get accurate CPU counts.

 From the NUMA <topology> the management apps can't tell if the CPU is a 
core or a thread. For example oVirt/VDSM bases the decisions on this 
information.

The management apps tend to avoid using cores as CPUs for the guests for 
performance reasons.

Any other ideas how to provide this kind of information to the mgmt apps?

>
> For the guest there are two cases to consider. If there is no NUMA in the
> guest there is no problem, because "total sockets" and "sockets per node"
> are the same. In the case where there is NUMA set, we should just ignore
> the guest 'sockets' attribute completely, and treat the 'cores' & 'threads'
> attributes and <vcpu> and <numa> elements as providing canonical data.
>
> Daniel
>