[Crash-utility] Are all linux_banner type cases covered in verify_version()?

David Mair dmair at suse.com
Mon Aug 21 16:23:39 UTC 2023


Thank you, I have some more information...

On 8/21/23 02:20, lijiang wrote:
> On Mon, Aug 21, 2023 at 9:37 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab at nec.com>
> wrote:
> 
>> On 2023/08/18 3:08, David Mair wrote:
>>> Hi All,
>>>
>>> Before I consider starting work on a patch for this I'd appreciate more
>>> input.
>>>
>>> I am seeing random cases of crash failing to load reporting an x86_64
>>> coredump reporting a "bad" linux_banner. However, the value displayed as
>>> the banner is:
>>>
>>> 0x65762078756e694c
>>>
>>> which is plainly ASCII text as a 64-bit number and is the little-endian
>>> reversal of the text "Linux ver".
>>>
>>> It's randomly found with specific coredumps and reproduces all times
>>> with that coredump and a given version of crash, though sometimes it
>>> will appear when using a given coredump with one version of crash but
>>> not with another version of crash. What I'm trying to get working is
>>> crash current and the rest of this is experience using crash 8.0.3 only.
>>>
>>> I used gdb crash to debug it through verify_version(). If I breakpoint
>>> there with gdb crash and step through the function I find that in the
>>> section:
>>>
>>>
>>>       if (!(sp = symbol_search("linux_banner")))
>>>           error(FATAL, "linux_banner symbol does not exist?\n");
>>>       else if ((sp->type == 'R') || (sp->type == 'r') ||
>>>           (THIS_KERNEL_VERSION >= LINUX(2,6,11) && (sp->type == 'D' ||
>>> sp->type == 'd')) ||
>>>            (machine_type("ARM") && sp->type == 'T') ||
>>>            (machine_type("ARM64")))
>>>           linux_banner = symbol_value("linux_banner");
>>>       else
>>>           get_symbol_data("linux_banner", sizeof(ulong), &linux_banner);
>>>
>>>
>>> * The if block is not executed, i.e. symbol_search("linux_banner")
>>> succeeded and we have a usable struct syment for "linux_banner" in sp
>>> * The else if block is not executed, all conditions are met or not
>>> relevant except for the the value of sp->type in the case of
>>> THIS_KERNEL_VERSION >= LINUX(2,6,11). But sp->type is 'B', bss segment
>>> * The final else block is executed, we copy sizeof(ulong) bytes of
>>> symbol data from what "linux_banner" refers to into the crash internal
>>> linux_banner variable
>>>
>>> Here's how sp looks at the else if statement in the above code:
>>>
>>> gdb) print *sp
>>> $2 = {value = 18446744071587233984, name = 0x5555566a735b "linux_banner",
>>>     val_hash_next = 0x7fffe51e4338, name_hash_next = 0x7fffe51f8d38,
>>>     type = 66 'B', cnt = 1 '\001', flags = 0 '\000', pad2 = 0 '\000'}
>>>...

I modified crash to dump every symbol added to the symbol table. The rpm 
package build in question has a patch that causes the symbol table to be 
built from the debuginfo file and before the patch it was built from the 
kernel executable.

If I toggle between the two models I find that in a case of crash 
failing due to the linux_banner pointer being ASCII characters as a 
pointer value then all symbol table entries generated from the debuginfo 
file have type "b" or 'B".

BUT, for the same coredump, in the case of crash building the symbol 
table from the kernel executable all symbols have varying types and 
linux_banner is 'D'.

There are about 78,000 symbol table entries created in both cases.

This error is not observed with every kernel debuginfo file. At this 
point if there is any reason for this crash package to use the patch to 
build the symbol table from the debuginfo file then I have to suspect 
either:

* Sometimes the kerneldebuginfo is created with all symbols having type 
bss segment when the kernel executable has varying types; or
* Something in the attempt to read the debuginfo fails such that bss 
segment gets used for all

The first is not a bug in crash and the current code would work if such 
debuginfo files were never generated and work when that debuginfo file 
is evaded such that the symbol table is built from the kernel executable 
the debuginfo is matched to.

I'll try to discover which case occurs using gdb crash from today and 
take on board Kazu's suggestion while I explore.

-- 
Thank you,
David.



More information about the Crash-utility mailing list