[Crash-utility] Unable to switch stack frames while using crash

Thu Jun 16 04:52:05 UTC 2011

Hi,

Thank you Dave for your time and help. 

As suggested, I will update my crash utility first and then go about analyzing the dump. 

>>> I believe that something like this might work?:

>>>  $ makedumpfile -c -d 31 -x vmlinux_temp vmcore-old vmcore-new

I tried to use the command you suggested " makedumpfile  -c -d 31 -x vmlinux_temp vmcore vmcore-new " . I got an error message " The kernel version is not supported.The created dumpfile may be incomplete. check_release: Can't get the kernel version" 

Should I update makedumpfile utility as well? Or just updating crash will do?

>>> Are you trying to re-create an ELF style dumpfile on purpose?

I tried to recreate the vmcore file in ELF format because, I can't get access to the original uncompressed ELF dump file which is in the customer machine.

Thanks and Regards
Shashidhara

-----Original Message-----
From: crash-utility-bounces at redhat.com [mailto:crash-utility-bounces at redhat.com] On Behalf Of Dave Anderson
Sent: Wednesday, June 15, 2011 9:33 PM
To: Discussion list for crash utility usage,maintenance and development
Subject: Re: [Crash-utility] Unable to switch stack frames while using crash

----- Original Message -----
> Hi Dave,
> 
> Thanks for the help, I have further input regarding query 3 . Please
> help.
> 
> > 3) I want to retrieve the address of a data structure in the current
> > context. How can it be done? I tried using struct command, but it did
> > not help
> 
> The struct command needs the correct virtual address of the structure
> you're trying to view. So I presume you're asking how to find the address
> of the data structure? If that's true, you're going to have to be a lot
> more specific.
> 
> >> I need to find out the virtual address of the structure tty of type
> >> struct tty_struct, which is passed as an argument to the function
> >> n_read_tty. Below is the corresponding stack trace.
> 
> >>PID: 13366 TASK: ffff88031b60d580 CPU: 1 COMMAND: "telnet"
> >> #0 [ffff88031ce759d0] machine_kexec at ffffffff81024486
> >> #1 [ffff88031ce75a40] crash_kexec at ffffffff8107e230
> >> #2 [ffff88031ce75b20] oops_end at ffffffff8100fa38
> >> #3 [ffff88031ce75b50] no_context at ffffffff8102d801
> >> #4 [ffff88031ce75ba0] __bad_area_nosemaphore at ffffffff8102d9c9
> >> #5 [ffff88031ce75c70] bad_area at ffffffff8102da41
> >> #6 [ffff88031ce75ca0] do_page_fault at ffffffff8102dd19
> >> #7 [ffff88031ce75cf0] page_fault at ffffffff812d7425
> >> #8 [ffff88031ce75d78] n_tty_read at ffffffff811f03b3
> >> #9 [ffff88031ce75ec0] tty_read at ffffffff811ebf7e
> >> #10 [ffff88031ce75f10] vfs_read at ffffffff810ebcc8
> >> #11 [ffff88031ce75f40] sys_read at ffffffff810ebe48
> >> #12 [ffff88031ce75f80] system_call_fastpath at ffffffff8100bbc2
> >> RIP: 00007ffff716b9e0 RSP: 00007fffffffdfc0 RFLAGS: 00010212
> >> RAX: 0000000000000000 RBX: ffffffff8100bbc2 RCX: 0000000000000000
> >> RDX: 0000000000001ff6 RSI: 000000000061c02a RDI: 0000000000000000
> >> RBP: 0000000000001ff6 R8: 0000000000000000 R9: 0000000000000000
> >> R10: 0000000000616680 R11: 0000000000000246 R12: 0000000000000000
> >> R13: 0000000000000001 R14: 000000000061c02a R15: 00000000006178a0
> >> ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b

First I would update your crash utility so that you have the exception
frame dump that was a result of the page fault, because it's possible that
the tty structure pointer is in the register dump.  But anyway, without
knowing the kernel version, it's hard to pinpoint exactly which instruction
in n_tty_read() generated the page fault.  Was the bad address generated
because the tty structure pointer was NULL?  And again, with an updated
crash utility, you'll get more information w/respect to the register
contents at the time of the page fault, and also you might get some help
finding it with "bt -F".  I'm not sure where the tty structure gets
allocated from -- is it statically-allocated, or is it allocated from
one of the "size-xxx" slab caches, etc...

> 
> >> I have another query I tried to convert the vmcore file to ELF
> >> format using "makedumpfile -E -d 31 -x vmlinux_temp vmcore
> >> dumpfile" . For which I got an error message " '-E' option is
> >> disable, because vmcore is kdump compressed format. makedumpfile Failed".
> 
> >>Please guide me further

Refiltering and the -E argument cannot be used together because
makedumpfile cannot regenerate an ELF vmcore file from a previously
compressed kdump dumpfile.

I believe that something like this might work?:

  $ makedumpfile -c -d 31 -x vmlinux_temp vmcore-old vmcore-new

Are you trying to re-create an ELF style dumpfile on purpose?

Dave

> Thanks and Regards
> Shashidhara
> 
> -----Original Message-----
> From: crash-utility-bounces at redhat.com
> [mailto:crash-utility-bounces at redhat.com] On Behalf Of Dave Anderson
> Sent: Wednesday, June 15, 2011 8:38 PM
> To: Discussion list for crash utility usage,maintenance and
> development
> Subject: Re: [Crash-utility] Unable to switch stack frames while using
> crash
> 
> 
> 
> ----- Original Message -----
> > Hi,
> >
> > I was investigating a 64 bit linux kernel dump . I have following
> > doubts regarding usage of crash.
> >
> > 1) I wanted to access the intermediate kernel stack frames. To know
> > the status of the frame and the point of failure.
> >
> > When I tried to access a stack frame I get an error message “ crash:
> > prohibited gdb command: frame ”. Can you please let me know if there
> > is any other way of accessing the kernel stack frames using crash.
> 
> Right -- the embedded gdb doesn't know anything about the core file
> (or live system) that you're running on. It's invoked as "gdb
> vmlinux",
> and doesn't know anything about any "frames".
> 
> As Flavio mentioned, you can see the stack data of each frame with
> "bt -f", or better yet, "bt -F" which may illuminate what the data
> may be, because it shows symbolic translations or slab cache names
> instead of raw values where appropriate.
> 
> > 2) When I run bt in crash, I get a stack trace. Another person from
> > a
> > different team reported a slightly different stack trace to mine.
> > Below are the stack traces. The register contents are quite
> > different
> > between the two
> >
> > My stack trace
> >
> > PID: 13366 TASK: ffff88031b60d580 CPU: 1 COMMAND: "telnet"
> >
> > #0 [ffff88031ce759d0] machine_kexec at ffffffff81024486
> > #1 [ffff88031ce75a40] crash_kexec at ffffffff8107e230
> > #2 [ffff88031ce75b20] oops_end at ffffffff8100fa38
> > #3 [ffff88031ce75b50] no_context at ffffffff8102d801
> > #4 [ffff88031ce75ba0] __bad_area_nosemaphore at ffffffff8102d9c9
> > #5 [ffff88031ce75c70] bad_area at ffffffff8102da41
> > #6 [ffff88031ce75ca0] do_page_fault at ffffffff8102dd19
> > #7 [ffff88031ce75cf0] page_fault at ffffffff812d7425
> > #8 [ffff88031ce75d78] n_tty_read at ffffffff811f03b3
> > #9 [ffff88031ce75ec0] tty_read at ffffffff811ebf7e
> > #10 [ffff88031ce75f10] vfs_read at ffffffff810ebcc8
> > #11 [ffff88031ce75f40] sys_read at ffffffff810ebe48
> > #12 [ffff88031ce75f80] system_call_fastpath at ffffffff8100bbc2
> > RIP: 00007ffff716b9e0 RSP: 00007fffffffdfc0 RFLAGS: 00010212
> > RAX: 0000000000000000 RBX: ffffffff8100bbc2 RCX: 0000000000000000
> > RDX: 0000000000001ff6 RSI: 000000000061c02a RDI: 0000000000000000
> > RBP: 0000000000001ff6 R8: 0000000000000000 R9: 0000000000000000
> > R10: 0000000000616680 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000001 R14: 000000000061c02a R15: 00000000006178a0
> > ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
> >
> >
> > Reported stack trace
> >
> > PID: 13366 TASK: ffff88031b60d580 CPU: 1 COMMAND: "telnet"
> > #0 [ffff88031ce759d0] machine_kexec at ffffffff81024486
> > #1 [ffff88031ce75a40] crash_kexec at ffffffff8107e230
> > #2 [ffff88031ce75ad8] n_tty_read at ffffffff811f03b3
> > #3 [ffff88031ce75b20] oops_end at ffffffff8100fa38
> > #4 [ffff88031ce75b50] no_context at ffffffff8102d801
> > #5 [ffff88031ce75ba0] __bad_area_nosemaphore at ffffffff8102d9c9
> > #6 [ffff88031ce75c20] native_sched_clock at ffffffff810120aa
> > #7 [ffff88031ce75c70] bad_area at ffffffff8102da41
> > #8 [ffff88031ce75ca0] do_page_fault at ffffffff8102dd19
> > #9 [ffff88031ce75cf0] page_fault at ffffffff812d7425
> > [exception RIP: n_tty_read+1420]
> > RIP: ffffffff811f03b3 RSP: ffff88031ce75da8 RFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffff8802cbd54a68 RCX: 000000000061c044
> > RDX: 0000000000000005 RSI: ffff88031ce75e87 RDI: ffff8802cbd54d1c
> > RBP: ffff88031ce75eb8 R8: 0000000000000000 R9: 0000000000000000
> > R10: 0000000000616680 R11: 0000000000000246 R12: 000000000061c044
> > R13: ffff8802cbd54800 R14: 0000000000000000 R15: 7fffffffffffffff
> > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff88031ce75ec0]
> > #10 [ffff88031ce75ec0] tty_read at ffffffff811ebf7e
> > #11 [ffff88031ce75f10] vfs_read at ffffffff810ebcc8
> > #12 [ffff88031ce75f40] sys_read at ffffffff810ebe48
> > #13 [ffff88031ce75f80] system_call_fastpath at ffffffff8100bbc2
> 
> The first backtrace is different because you are apparently using an
> older version of the crash utility, because it is not showing the
> page fault exception frame like the "reported" version.
> 
> >
> > 3) I want to retrieve the address of a data structure in the current
> > context. How can it be done? I tried using struct command, but it
> > did
> > not help
> 
> The struct command needs the correct virtual address of the structure
> you're trying to view. So I presume you're asking how to find the
> address
> of the data structure? If that's true, you're going to have to be a
> lot
> more specific.
> 
> > 4) When I run the command readelf -a vmcore, I get an error message
> > ”readelf: Error: Not an ELF file - it has the wrong magic bytes at
> > the
> > start.”
> 
> I presume that the dumpfile is a compressed kdump dumpfile generated
> by makedumpfile, which takes the original /proc/vmcore ELF dumpfile
> and creates its own unique dumpfile format.
> 
> Dave
> 
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 
> Information transmitted by this e-mail is proprietary to MphasiS, its
> associated companies and/ or its customers and is intended
> for use only by the individual or entity to which it is addressed, and
> may contain information that is privileged, confidential or
> exempt from disclosure under applicable law. If you are not the
> intended recipient or it appears that this mail has been forwarded
> to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly
> prohibited. In such cases, please notify us immediately at
> mailmaster at mphasis.com and delete this mail from your records.
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility at redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Information transmitted by this e-mail is proprietary to MphasiS, its associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at mailmaster at mphasis.com and delete this mail from your records.