[Crash-utility] crash does not work with last fedora kernels?

Tao Liu ltao at redhat.com
Thu Aug 5 05:59:49 UTC 2021


On Thu, Aug 05, 2021 at 03:19:44AM +0000, Alexey Makhalov wrote:
> Hi Tao Liu,
> 
> Can you provide an information here for people who do not have
> a Redhad subscription, please. Is it an issue in the crash or GDB.
> To me it is related to GDB's  line wrapping based on terminal width
> which pexpect can not properly parse.

Hello Alexey,

Sorry about that, I paste the information from the bugzilla link as follows:

On x86_64, run the following command:
echo bt | /root/crash-gdb10.2-devel-temp/crash -s crash.usersys.redhat.com/3.10.0-64.el7_reboot_BUG/vmcore crash.usersys.redhat.com/3.10.0-64.el7_reboot_BUG/vmlinux-3.10.0-64.el7.gz 2>&1 | cat
The output will be:

PID: 1934   TASK: ffff88020fb8b610  CPU: 0   COMMAND: "reboot"
 #0 [ffff880212de38e8] machine_kexec at ffffffff8103ef82
bt: invalid input: "jne"
bt: invalid input: "mov"
bt: invalid input: "movl"
bt: invalid input: "call"
bt: invalid input: "call"
bt: invalid input: "call"
bt: invalid input: "call"
bt: invalid input: "call"
bt: invalid input: "jne"
bt: invalid input: "cmovb"
bt: invalid input: "cmova"
bt: invalid input: "jne"
bt: invalid input: "call"
 #1 [ffff880212de3938] crash_kexec at ffffffff810c6c73
 #2 [ffff880212de3a00] oops_end at ffffffff815c5268
 #3 [ffff880212de3a28] no_context at ffffffff815b62de
 #4 [ffff880212de3a70] __bad_area_nosemaphore at ffffffff815b635e
 #5 [ffff880212de3ab8] bad_area at ffffffff815b66d9
 #6 [ffff880212de3ae0] __do_page_fault at ffffffff815c809c
 #7 [ffff880212de3bd8] do_page_fault at ffffffff815c816a
 #8 [ffff880212de3c00] page_fault at ffffffff815c4508

Remove the "Fix for the tab completion output issues" patch, the output will be fine. Several vmcore/vmlinux can reproduce the regression.

The root cause is at x86_64.c:x86_64_get_framesize:

        sprintf(buf, "x/%ldi 0x%lx",
                max_instructions, sp->value);

        if (!gdb_pass_through(buf, pc->tmpfile2, GNU_RETURN_ON_ERROR)) {
        ....
        while (fgets(buf, BUFSIZE, pc->tmpfile2)) {
        ....
                current =  htol(strip_ending_char(arglist[0], ':'), 
                
The content of pc->tmpfile2 is like:

   0xffffffff81072780 <SyS_reboot>:	nopl   0x0(%rax,%rax,1)
   0xffffffff81072785 <SyS_reboot+5>:	push   %rbp
   0xffffffff81072786 <SyS_reboot+6>:	mov    %rsp,%rbp
   0xffffffff81072789 <SyS_reboot+9>:	call   0xffffffff81072500 <SYSC_reboot>
   0xffffffff8107278e <SyS_reboot+14>:	pop    %rbp
   0xffffffff8107278f <SyS_reboot+15>:	ret    
   0xffffffff81072790 <ctrl_alt_del>:	nopl   0x0(%rax,%rax,1)
   0xffffffff81072795 <ctrl_alt_del+5>:	
    mov    0x86a4dd(%rip),%eax        # 0xffffffff818dcc78 <C_A_D>
   0xffffffff8107279b <ctrl_alt_del+11>:	push   %rbp
   0xffffffff8107279c <ctrl_alt_del+12>:	mov    %rsp,%rbp
   0xffffffff8107279f <ctrl_alt_del+15>:	test   %eax,%eax
   0xffffffff810727a1 <ctrl_alt_del+17>:	
    jne    0xffffffff810727c0 <ctrl_alt_del+48>
   0xffffffff810727a3 <ctrl_alt_del+19>:	
    mov    0xbe37de(%rip),%rdi        # 0xffffffff81c55f88 <cad_pid>
   0xffffffff810727aa <ctrl_alt_del+26>:	mov    $0x1,%edx
   
fgets will read each line of pc->tmpfile2, and split it by ":", then htol the address.
The program expect one line like:

   0xffffffff81072795 <ctrl_alt_del+5>:	 mov    0x86a4dd(%rip),%eax        # 0xffffffff818dcc78 <C_A_D>

rather than 2 lines:

   0xffffffff81072795 <ctrl_alt_del+5>:	
    mov    0x86a4dd(%rip),%eax        # 0xffffffff818dcc78 <C_A_D>
    
When "mov" was passed to htol, it will output the error message: invalid input: "mov".
So the problem is caused by setting gdb screen size.

There are 2 ways to fix:

1) filter out the lines which is not started with address before passing to htol.
2) modify the code of setting gdb screen size.

Thanks,
Tao Liu

> 
> Thanks,
> --Alexey
> 
> On 8/4/21, 6:58 PM, "crash-utility-bounces at redhat.com on behalf of Tao Liu" <crash-utility-bounces at redhat.com on behalf of ltao at redhat.com> wrote:
> 
>     On Fri, Jul 30, 2021 at 06:04:59PM -0400, David Wysochanski wrote:
>     > I cannot share the vmcore unfortunately due to data contents.
>     > 
>     > There's something strange going on in my setup though.
>     > I run the Python expect module ("pexpect" ) for automation and run
>     > crash commands under that.
>     > For some reason, when I run this gdb10 test branch, I get these weird errors.
>     > But then I'm not seeing errors when I run the same commands manually
>     > on the same vmcore and gdb10 test branch.
>     > So it's the combination of the new gdb10 branch plus pexpect
>     > environment where I see them, and only with some vmcores.
>     > It's possible some crash output format changes and confuses pexpect,
>     > and it's not a crash bug at all.
>     > I've also seen some strange crash behavior when certain sequences of
>     > crash commands are run.
>     > I'll see if I can narrow down the problem further and report back.
>     > 
> 
>     Hello David,
> 
>     The error message as "bt: invalid input jne" is a known issue, you can find it in:
> 
>     https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1896647%23c18&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061020250%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SXqtiw2W%2B0RRSsobeI4Gat7jogSTNnXNNF9%2BL%2Bon%2FZg%3D&reserved=0
> 
>     and the root cause for the issue:
> 
>     https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1896647%23c19&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TAxKlCfNvuvWnso3nNqlA6GWKlwyopPKFry9NjcrzT8%3D&reserved=0
> 
>     Thanks,
>     Tao Liu
> 
>     > 
>     > On Fri, Jul 30, 2021 at 2:12 PM Alexey Makhalov <amakhalov at vmware.com> wrote:
>     > >
>     > > Hi David,
>     > > Can you share vmcore and kernel images? Or provide an intructions how to recreate this core?
>     > > I would be nice if you perform set of testing with your cores!
>     > > Thanks,
>     > > --Alexey
>     > >
>     > > On 7/30/21, 7:44 AM, "crash-utility-bounces at redhat.com on behalf of David Wysochanski" <crash-utility-bounces at redhat.com on behalf of dwysocha at redhat.com> wrote:
>     > >
>     > >     On Thu, Jul 29, 2021 at 9:57 PM lijiang <lijiang at redhat.com> wrote:
>     > >     >
>     > >     > >
>     > >     > > Hi, David
>     > >     > > Thank you for the attention.
>     > >     > > Currently, Fedora kernel has been forced to generate the DWARF4 debuginfo via the
>     > >     > > CONFIG_DEBUG_INFO_DWARF4 kernel option, see jforbes' comment:
>     > >     > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsrc.fedoraproject.org%2Frpms%2Fkernel%2Fpull-request%2F48&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WOOZfDOhwKTYQLKd4OwgQU4xbIzaX4GO0SqQhKowFvw%3D&reserved=0
>     > >     > >
>     > >     > > Once crash gdb is upgraded, the DWARF5 could be enabled again in Fedora kernel.
>     > >     > >
>     > >     > > BTW: that is a temporary branch, still under tests and it has not been announced yet.
>     > >     > >
>     > >     >
>     > >     > > Lianbo,
>     > >     >
>     > >     > > Please STOP replying to the digest, but reply properly on the
>     > >     > > appropriate email thread.
>     > >     > > I've been involved in a lot of open source projects over the past 15 years,
>     > >     > > and you're the only one I've ever seen that replies to a digest, not the
>     > >     > > appropriate email thread.
>     > >     >
>     > >     > Good suggestions, David.
>     > >     >
>     > >     > I remember that you reminded me about this issue before this time. But recently my email system switched to Gmail, and made a mistake again. I'm trying to get used to the Gmail
>     > >     > system.
>     > >     >
>     > >     > But anyway, I hope that my last reply answered your questions.
>     > >     >
>     > >
>     > >     Yes, thank you.  But did you see my feedback about some of the error
>     > >     output, when I tested your branch?
>     > >     Do you have a list of existing issues?
>     > >     I can fairly easily test your experimental branch on a series of
>     > >     vmcores, if that helps.
>     > >
>     > >     Here's the feedback, cut/pasted from the other email thread:
>     > >
>     > >     I'm seeing a lot of "invalid input" displayed like the below when
>     > >     using the 'bt' command.  Is this a known issue?
>     > >
>     > >     bt: invalid input: "jne"
>     > >     bt: invalid input: "mov"
>     > >     bt: invalid input: "movl"
>     > >     bt: invalid input: "jne"
>     > >     bt: invalid input: "jne"
>     > >     bt: invalid input: "jne"
>     > >     bt: invalid input: "rep"
>     > >     bt: invalid input: "je"
>     > >     bt: invalid input: "je"
>     > >     bt: invalid input: "je"
>     > >     bt: invalid input: "call"
>     > >     bt: invalid input: "call"
>     > >     bt: invalid input: "jne"
>     > >     bt: invalid input: "call"
>     > >     bt: invalid input: "call"
>     > >     bt: invalid input: "movl"
>     > >     bt: invalid input: "mov"
>     > >     bt: invalid input: "jne"
>     > >     bt: invalid input: "mov"
>     > >     bt: invalid input: "je"
>     > >     bt: invalid input: "call"
>     > >
>     > >     --
>     > >     Crash-utility mailing list
>     > >     Crash-utility at redhat.com
>     > >     https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistman.redhat.com%2Fmailman%2Flistinfo%2Fcrash-utility&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mVBDrmY1Zz4RziAH6ZnmTO7tTkKjDovgYjR6AjXk9zc%3D&reserved=0
>     > >
>     > >
>     > >
>     > > --
>     > > Crash-utility mailing list
>     > > Crash-utility at redhat.com
>     > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistman.redhat.com%2Fmailman%2Flistinfo%2Fcrash-utility&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mVBDrmY1Zz4RziAH6ZnmTO7tTkKjDovgYjR6AjXk9zc%3D&reserved=0
>     > 
>     > 
>     > --
>     > Crash-utility mailing list
>     > Crash-utility at redhat.com
>     > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistman.redhat.com%2Fmailman%2Flistinfo%2Fcrash-utility&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mVBDrmY1Zz4RziAH6ZnmTO7tTkKjDovgYjR6AjXk9zc%3D&reserved=0
> 
>     --
>     Crash-utility mailing list
>     Crash-utility at redhat.com
>     https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistman.redhat.com%2Fmailman%2Flistinfo%2Fcrash-utility&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mVBDrmY1Zz4RziAH6ZnmTO7tTkKjDovgYjR6AjXk9zc%3D&reserved=0
> 
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://listman.redhat.com/mailman/listinfo/crash-utility




More information about the Crash-utility mailing list