[libvirt] Problem of host CPU topology parsing

Fri May 11 10:22:08 UTC 2012

On 2012年05月11日 18:07, Jiri Denemark wrote:
> On Fri, May 11, 2012 at 10:53:12 +0100, Daniel P. Berrange wrote:
>> On Fri, May 11, 2012 at 05:42:34PM +0800, Osier Yang wrote:
>>> On 2012年05月11日 17:01, Jiri Denemark wrote:
>>>> On Fri, May 11, 2012 at 10:47:06 +0200, Michal Privoznik wrote:
>>>>> On 11.05.2012 10:40, Osier Yang wrote:
>>>>>>      /* nodeinfo->sockets is supposed to be a number of sockets per NUMA
>>>>>> node,
>>>>>>       * however if NUMA nodes are not composed of whole sockets, we just lie
>>>>>>       * about the number of NUMA nodes and force apps to check
>>>>>> capabilities XML
>>>>>>       * for the actual NUMA topology.
>>>>>>       */
>>>>>>      if (nodeinfo->sockets % nodeinfo->nodes == 0)
>>>>>>          nodeinfo->sockets /= nodeinfo->nodes;
>>>>>>      else
>>>>>>          nodeinfo->nodes = 1;
>>>>>>
>>>>>> Jirka said this was for a fix, but I don't quite understand it,
>>>>>> what does the "nodeinfo.nodes" mean actually? Shouldn't it
>>>>>> be 8 (for the 48 CPUs machine) instead? But then we will be
>>>>>> wrong again with using VIR_NODEINFO_MAXCPUS.
>>>>>
>>>>> Why do you think it will be wrong? My understanding is that
>>>>> VIR_NODEINFO_MAXCPUS just tell the max number of possible cpus not the
>>>>> actual. So if it's over 48 we are safe.
>>>>
>>>> Not really, the macro should count exactly the number of CPUs available to
>>>> host, otherwise lots of other issues (incl. backward compatibility) appear. It
>>>> is just a badly named macro that should never exist but we can't do anything
>>>> with it since it is our public API.
>>>>
>>>>> Btw: the code above seems like a hack to me.
>>>>
>>>> Yes, it is a hack but it's unfortunately required because we can't change the
>>>> macro.
>>>>
>>>> Anyway, I agree with Daniel that the bug most likely lies somewhere in the
>>>> code that populates nodeinfo structure.
>>>>
>>>> Jirka
>>>
>>> In /proc/cpuinfo:
>>>
>>> <snip>
>>> cpu cores       : 12
>>> </snip>
>>>
>>> However, there are only 6 core IDs, as showed in
>>> http://fpaste.org/mtoA/. And we parse the core_id
>>> file of each CPU as:
>>>
>>>          core = parse_core(cpu);
>>>          if (!CPU_ISSET(core,&core_mask)) {
>>>              CPU_SET(core,&core_mask);
>>>              nodeinfo->cores++;
>>>          }
>>>
>>> and thus get only 6 cores. Don't known how 12 in /proc/cpuinfo
>>> is figured out. But could it be a clue?
>>
>> Ahhh.  The AMD 12 "core" CPUs are in fact a pair of 6 core CPUs
>> with 2 NUMA nodes in the CPU itself.
>
> Oh, so the problem is that two 6-core CPUs share the same socket and thus have
> the same physical ID. So it's either 8 6-core CPUs or 4 12-core CPUs. Not
> sure which one is better to present. The first one is the real thing and the
> second one is how AMD presents the reality :-) Anyway, we should do something
> with
>
>          /* Parse core */
>          core = parse_core(cpu);
>          if (!CPU_ISSET(core,&core_mask)) {
>              CPU_SET(core,&core_mask);
>              nodeinfo->cores++;
>          }
>
>          /* Parse socket */
>          sock = parse_socket(cpu);
>          if (!CPU_ISSET(sock,&socket_mask)) {
>              CPU_SET(sock,&socket_mask);
>              nodeinfo->sockets++;
>          }
>
> which just ignores duplicate physical/core IDs. I feel like this was added
> there for some reason, though...
>

Do you mean remove the checking of duplicate physical/core IDs? if so,
we will get both nodeinfo->cores and nodeinfo->sockets with value 48.

Regards,
Osier