[libvirt] [PATCH] nodeinfo: Add workaround if kernel reports bogous numa topology.

Peter Krempa pkrempa at redhat.com
Tue Oct 23 15:45:46 UTC 2012


On 10/23/12 17:07, Eric Blake wrote:
> On 10/23/2012 06:40 AM, Peter Krempa wrote:
>> This patch adds a workaround for the cpu topology detection code if the
>> kernel reports incorrect count of numa nodes or any other reason that
>> might result in duplicate core ID's in one socket.
>
> Do we know what versions of kernels have this bug?  If it is still
> current, has this been opened as a bug report against the kernel?

The guy who reported it told me that it's fedora 17. I'll try to reach 
him tomorrow to gather more info.

>
>>
>> If such a situation is detected, the detection code reports the correct
>> number of processors but the topology is mangled:
>>
>>      $ virsh nodeinfo
>>      CPU model:           x86_64
>>      CPU(s):              24
>>      CPU frequency:       2200 MHz
>>      CPU socket(s):       1
>>      Core(s) per socket:  24
>>      Thread(s) per core:  1
>>      NUMA cell(s):        1
>>      Memory size:         8047272 KiB
>>
>> This patch also adds test data for such a situation.
>>
>> Reported (and test data provided) by: gcbirzan on #virt at OFTC
>> ---
>> This patch looks huge but it's mostly test data.
>
> Concur; reading just the changes to src/nodeinfo.c is adequate for
> evaluating the patch.
>
>> +++ b/src/nodeinfo.c
>> @@ -200,7 +200,9 @@ CPU_COUNT(cpu_set_t *set)
>>   # endif /* !CPU_COUNT */
>>
>>   /* parses a node entry, returning number of processors in the node and
>> - * filling arguments */
>> + * filling arguments.
>> + * Returns -1 on error, -2 if kernel reports invalid numa topology and number
>> + * of processors in the node otherwise */
>>   static int
>>   virNodeParseNode(const char *node, int *sockets, int *cores, int *threads)
>>       ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2)
>> @@ -219,6 +221,7 @@ virNodeParseNode(const char *node, int *sockets, int *cores, int *threads)
>>       int siblings;
>>       unsigned int cpu;
>>       int online;
>> +    bool bad_topology = false;
>>
>>       *threads = 0;
>>       *cores = 0;
>> @@ -300,11 +303,14 @@ virNodeParseNode(const char *node, int *sockets, int *cores, int *threads)
>>           core = virNodeGetCpuValue(node, cpu, "topology/core_id", false);
>>   # endif
>>
>> -        CPU_SET(core, &core_maps[sock]);
>> -
>>           if (!(siblings = virNodeCountThreadSiblings(node, cpu)))
>>               goto cleanup;
>>
>> +        if (CPU_ISSET(core, &core_maps[sock]) && siblings <= 1)
>> +            bad_topology = true;
>> +
>> +        CPU_SET(core, &core_maps[sock]);
>
> If you install the 'hwloc' package on the afflicted machine, can you
> then send us the file created by  'lstopo proc.png', so we can get a
> better feel for what the machine actually supports?  Is it really a case
> of duplicate processor ids being mistakenly reported by the kernel, or
> more a case of us mis-reading sysfs and seeing duplicates because we
> aren't looking in enough other places?

The duplicate processor ids are actually correct if the machine reports 
the numa topology as it is. The magny cours processor that is used in 
the system reporting the issue has 2 numa nodes, each of cores in a 
physical package. It is actualy correct if those two have identical core 
ID's.

The issue observed is that the numa topology definition as reported in 
sysfs or by numactl has only 1 node instead of 4 that should have been 
reported.

>
> At any rate, this patch looks okay, if we can prove that the kernel
> really is giving us bogus information, and not just a case of us
> misreading things.

I will try to reach the guy to provide the additional information, but 
I'm afraid there's some issue with the kernel numa topology reporting.

>
>
>
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
>




More information about the libvir-list mailing list