[Crash-utility] crash does not work with last fedora kernels?
Tao Liu
ltao at redhat.com
Thu Aug 5 05:59:49 UTC 2021
On Thu, Aug 05, 2021 at 03:19:44AM +0000, Alexey Makhalov wrote:
> Hi Tao Liu,
>
> Can you provide an information here for people who do not have
> a Redhad subscription, please. Is it an issue in the crash or GDB.
> To me it is related to GDB's line wrapping based on terminal width
> which pexpect can not properly parse.
Hello Alexey,
Sorry about that, I paste the information from the bugzilla link as follows:
On x86_64, run the following command:
echo bt | /root/crash-gdb10.2-devel-temp/crash -s crash.usersys.redhat.com/3.10.0-64.el7_reboot_BUG/vmcore crash.usersys.redhat.com/3.10.0-64.el7_reboot_BUG/vmlinux-3.10.0-64.el7.gz 2>&1 | cat
The output will be:
PID: 1934 TASK: ffff88020fb8b610 CPU: 0 COMMAND: "reboot"
#0 [ffff880212de38e8] machine_kexec at ffffffff8103ef82
bt: invalid input: "jne"
bt: invalid input: "mov"
bt: invalid input: "movl"
bt: invalid input: "call"
bt: invalid input: "call"
bt: invalid input: "call"
bt: invalid input: "call"
bt: invalid input: "call"
bt: invalid input: "jne"
bt: invalid input: "cmovb"
bt: invalid input: "cmova"
bt: invalid input: "jne"
bt: invalid input: "call"
#1 [ffff880212de3938] crash_kexec at ffffffff810c6c73
#2 [ffff880212de3a00] oops_end at ffffffff815c5268
#3 [ffff880212de3a28] no_context at ffffffff815b62de
#4 [ffff880212de3a70] __bad_area_nosemaphore at ffffffff815b635e
#5 [ffff880212de3ab8] bad_area at ffffffff815b66d9
#6 [ffff880212de3ae0] __do_page_fault at ffffffff815c809c
#7 [ffff880212de3bd8] do_page_fault at ffffffff815c816a
#8 [ffff880212de3c00] page_fault at ffffffff815c4508
Remove the "Fix for the tab completion output issues" patch, the output will be fine. Several vmcore/vmlinux can reproduce the regression.
The root cause is at x86_64.c:x86_64_get_framesize:
sprintf(buf, "x/%ldi 0x%lx",
max_instructions, sp->value);
if (!gdb_pass_through(buf, pc->tmpfile2, GNU_RETURN_ON_ERROR)) {
....
while (fgets(buf, BUFSIZE, pc->tmpfile2)) {
....
current = htol(strip_ending_char(arglist[0], ':'),
The content of pc->tmpfile2 is like:
0xffffffff81072780 <SyS_reboot>: nopl 0x0(%rax,%rax,1)
0xffffffff81072785 <SyS_reboot+5>: push %rbp
0xffffffff81072786 <SyS_reboot+6>: mov %rsp,%rbp
0xffffffff81072789 <SyS_reboot+9>: call 0xffffffff81072500 <SYSC_reboot>
0xffffffff8107278e <SyS_reboot+14>: pop %rbp
0xffffffff8107278f <SyS_reboot+15>: ret
0xffffffff81072790 <ctrl_alt_del>: nopl 0x0(%rax,%rax,1)
0xffffffff81072795 <ctrl_alt_del+5>:
mov 0x86a4dd(%rip),%eax # 0xffffffff818dcc78 <C_A_D>
0xffffffff8107279b <ctrl_alt_del+11>: push %rbp
0xffffffff8107279c <ctrl_alt_del+12>: mov %rsp,%rbp
0xffffffff8107279f <ctrl_alt_del+15>: test %eax,%eax
0xffffffff810727a1 <ctrl_alt_del+17>:
jne 0xffffffff810727c0 <ctrl_alt_del+48>
0xffffffff810727a3 <ctrl_alt_del+19>:
mov 0xbe37de(%rip),%rdi # 0xffffffff81c55f88 <cad_pid>
0xffffffff810727aa <ctrl_alt_del+26>: mov $0x1,%edx
fgets will read each line of pc->tmpfile2, and split it by ":", then htol the address.
The program expect one line like:
0xffffffff81072795 <ctrl_alt_del+5>: mov 0x86a4dd(%rip),%eax # 0xffffffff818dcc78 <C_A_D>
rather than 2 lines:
0xffffffff81072795 <ctrl_alt_del+5>:
mov 0x86a4dd(%rip),%eax # 0xffffffff818dcc78 <C_A_D>
When "mov" was passed to htol, it will output the error message: invalid input: "mov".
So the problem is caused by setting gdb screen size.
There are 2 ways to fix:
1) filter out the lines which is not started with address before passing to htol.
2) modify the code of setting gdb screen size.
Thanks,
Tao Liu
>
> Thanks,
> --Alexey
>
> On 8/4/21, 6:58 PM, "crash-utility-bounces at redhat.com on behalf of Tao Liu" <crash-utility-bounces at redhat.com on behalf of ltao at redhat.com> wrote:
>
> On Fri, Jul 30, 2021 at 06:04:59PM -0400, David Wysochanski wrote:
> > I cannot share the vmcore unfortunately due to data contents.
> >
> > There's something strange going on in my setup though.
> > I run the Python expect module ("pexpect" ) for automation and run
> > crash commands under that.
> > For some reason, when I run this gdb10 test branch, I get these weird errors.
> > But then I'm not seeing errors when I run the same commands manually
> > on the same vmcore and gdb10 test branch.
> > So it's the combination of the new gdb10 branch plus pexpect
> > environment where I see them, and only with some vmcores.
> > It's possible some crash output format changes and confuses pexpect,
> > and it's not a crash bug at all.
> > I've also seen some strange crash behavior when certain sequences of
> > crash commands are run.
> > I'll see if I can narrow down the problem further and report back.
> >
>
> Hello David,
>
> The error message as "bt: invalid input jne" is a known issue, you can find it in:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1896647%23c18&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061020250%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SXqtiw2W%2B0RRSsobeI4Gat7jogSTNnXNNF9%2BL%2Bon%2FZg%3D&reserved=0
>
> and the root cause for the issue:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1896647%23c19&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TAxKlCfNvuvWnso3nNqlA6GWKlwyopPKFry9NjcrzT8%3D&reserved=0
>
> Thanks,
> Tao Liu
>
> >
> > On Fri, Jul 30, 2021 at 2:12 PM Alexey Makhalov <amakhalov at vmware.com> wrote:
> > >
> > > Hi David,
> > > Can you share vmcore and kernel images? Or provide an intructions how to recreate this core?
> > > I would be nice if you perform set of testing with your cores!
> > > Thanks,
> > > --Alexey
> > >
> > > On 7/30/21, 7:44 AM, "crash-utility-bounces at redhat.com on behalf of David Wysochanski" <crash-utility-bounces at redhat.com on behalf of dwysocha at redhat.com> wrote:
> > >
> > > On Thu, Jul 29, 2021 at 9:57 PM lijiang <lijiang at redhat.com> wrote:
> > > >
> > > > >
> > > > > Hi, David
> > > > > Thank you for the attention.
> > > > > Currently, Fedora kernel has been forced to generate the DWARF4 debuginfo via the
> > > > > CONFIG_DEBUG_INFO_DWARF4 kernel option, see jforbes' comment:
> > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsrc.fedoraproject.org%2Frpms%2Fkernel%2Fpull-request%2F48&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WOOZfDOhwKTYQLKd4OwgQU4xbIzaX4GO0SqQhKowFvw%3D&reserved=0
> > > > >
> > > > > Once crash gdb is upgraded, the DWARF5 could be enabled again in Fedora kernel.
> > > > >
> > > > > BTW: that is a temporary branch, still under tests and it has not been announced yet.
> > > > >
> > > >
> > > > > Lianbo,
> > > >
> > > > > Please STOP replying to the digest, but reply properly on the
> > > > > appropriate email thread.
> > > > > I've been involved in a lot of open source projects over the past 15 years,
> > > > > and you're the only one I've ever seen that replies to a digest, not the
> > > > > appropriate email thread.
> > > >
> > > > Good suggestions, David.
> > > >
> > > > I remember that you reminded me about this issue before this time. But recently my email system switched to Gmail, and made a mistake again. I'm trying to get used to the Gmail
> > > > system.
> > > >
> > > > But anyway, I hope that my last reply answered your questions.
> > > >
> > >
> > > Yes, thank you. But did you see my feedback about some of the error
> > > output, when I tested your branch?
> > > Do you have a list of existing issues?
> > > I can fairly easily test your experimental branch on a series of
> > > vmcores, if that helps.
> > >
> > > Here's the feedback, cut/pasted from the other email thread:
> > >
> > > I'm seeing a lot of "invalid input" displayed like the below when
> > > using the 'bt' command. Is this a known issue?
> > >
> > > bt: invalid input: "jne"
> > > bt: invalid input: "mov"
> > > bt: invalid input: "movl"
> > > bt: invalid input: "jne"
> > > bt: invalid input: "jne"
> > > bt: invalid input: "jne"
> > > bt: invalid input: "rep"
> > > bt: invalid input: "je"
> > > bt: invalid input: "je"
> > > bt: invalid input: "je"
> > > bt: invalid input: "call"
> > > bt: invalid input: "call"
> > > bt: invalid input: "jne"
> > > bt: invalid input: "call"
> > > bt: invalid input: "call"
> > > bt: invalid input: "movl"
> > > bt: invalid input: "mov"
> > > bt: invalid input: "jne"
> > > bt: invalid input: "mov"
> > > bt: invalid input: "je"
> > > bt: invalid input: "call"
> > >
> > > --
> > > Crash-utility mailing list
> > > Crash-utility at redhat.com
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistman.redhat.com%2Fmailman%2Flistinfo%2Fcrash-utility&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mVBDrmY1Zz4RziAH6ZnmTO7tTkKjDovgYjR6AjXk9zc%3D&reserved=0
> > >
> > >
> > >
> > > --
> > > Crash-utility mailing list
> > > Crash-utility at redhat.com
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistman.redhat.com%2Fmailman%2Flistinfo%2Fcrash-utility&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mVBDrmY1Zz4RziAH6ZnmTO7tTkKjDovgYjR6AjXk9zc%3D&reserved=0
> >
> >
> > --
> > Crash-utility mailing list
> > Crash-utility at redhat.com
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistman.redhat.com%2Fmailman%2Flistinfo%2Fcrash-utility&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mVBDrmY1Zz4RziAH6ZnmTO7tTkKjDovgYjR6AjXk9zc%3D&reserved=0
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flistman.redhat.com%2Fmailman%2Flistinfo%2Fcrash-utility&data=04%7C01%7Camakhalov%40vmware.com%7C6b2cb08f43ce4d5c559108d957b481ab%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637637255061030246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mVBDrmY1Zz4RziAH6ZnmTO7tTkKjDovgYjR6AjXk9zc%3D&reserved=0
>
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://listman.redhat.com/mailman/listinfo/crash-utility
More information about the Crash-utility
mailing list