[Crash-utility] Problem with NUMA Nodes

sharyathi nagesh sharyath at in.ibm.com
Wed May 2 06:22:45 UTC 2007


Dave
Thanks for the feed back. I am attaching the patch as per out 
discussion, tested and it is working. Have a look at it and let me know 
of your opinion.
Thanks
Sharyathi N

Dave Anderson wrote:
> sharyathi nagesh wrote:
>
>   
>> Hi
>>     I am seeing this problem with crash tool on a system with NUMA nodes.
>> crash exits with error message and no further analysis of dump is possible.
>> =====
>> Error message:
>>
>> cassinilp1:~ # crash
>>
>> crash 4.0-3.14
>> Copyright (C) 2002, 2003, 2004, 2005, 2006  Red Hat, Inc.
>> Copyright (C) 2004, 2005, 2006  IBM Corporation
>> Copyright (C) 1999-2006  Hewlett-Packard Co
>> Copyright (C) 2005  Fujitsu Limited
>> Copyright (C) 2005  NEC Corporation
>> Copyright (C) 1999, 2002  Silicon Graphics, Inc.
>> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>> This program is free software, covered by the GNU General Public License,
>> and you are welcome to change it and/or distribute copies of it under
>> certain conditions.  Enter "help copying" to see the conditions.
>> This program has absolutely no warranty.  Enter "help warranty" for details.
>>
>> GNU gdb 6.1
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "powerpc64-unknown-linux-gnu"...
>>
>> crash: numnodes out of sync with pgdat_list?
>>
>> =====
>> System configuration is given as
>>
>> Node 0 Memory:
>> Node 1 Memory:
>> Node 2 Memory:
>> Node 3 Memory:
>> Node 4 Memory: 0x0-0x180000000
>>
>> Node 0 CPUs: 0
>> Node 1 CPUs:
>> Node 2 CPUs:
>> Node 3 CPUs:
>> Node 4 CPUs: 1
>> =====
>> The problem is noticed because of mismatch:
>>
>>  if (n != vt->numnodes)
>>                 error(FATAL, "numnodes out of sync with pgdat_list?\n");
>> in memory.c/dump_memory_nodes() function
>>
>>         The problem is because of the mismatch between node_online_map and the number of nodes observed by traversing through pgdat_list.
>> node_online_map bit is set differently in kernel version 2.6.16 and 2.6.19.
>>         In earlier version all the bits from the first bit to
>> nth bit, where n is last Node to which memory is assigned is set to '1'.
>>         But in later version node is considered online if either memory or cpu is allocated (or both).
>>
>> So I need your suggestion on how to go and fix the problem
>> A few ideas I had were
>> 1) If KERNEL_VERSION <= 2.6.16 set increment vt->numnodes only if bits of node_online_map and cpu_online_map are set.
>>    if KERNEL_VERSIOn > 2.6.16 use only node_online_map
>>         (This will partly solve the problem)
>> 2) or as in node_table_init(). Raise the error only when CRASHDEBUG(2) is set else update vt->numnodes with 'n'
>>
>> Please let me know of your opinion
>> Regards
>> Sharyathi Nagesh
>>
>>     
>
> Hi Sharyathi,
>
> Thanks a lot for debugging this.
>
> I prefer your idea (2) -- which if it works OK in your case -- will not break
> any other currently-working incarnations.
>
> Also, just to clarify, when you say "Raise the error...", node_table_init()
> only makes an "error(NOTE, ...)" call, so you would simply get a "NOTE: ..."
> message displayed if CRASHDEBUG(2), and the crash session would
> still continue.  That's also what we would want in this case, unlike the
> "error(FATAL, ...)", session-ending, error that you're seeing now...
>
> Thanks,
>   Dave
>
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: numnodes_out_of_sync.patch
Type: text/x-patch
Size: 645 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20070502/8286a7b0/attachment.bin>


More information about the Crash-utility mailing list