[Crash-utility] Unable to switch stack frames while using crash

Wed Jun 15 15:07:46 UTC 2011

----- Original Message -----
> Hi,
> 
> I was investigating a 64 bit linux kernel dump . I have following
> doubts regarding usage of crash.
> 
> 1) I wanted to access the intermediate kernel stack frames. To know
> the status of the frame and the point of failure.
> 
> When I tried to access a stack frame I get an error message “ crash:
> prohibited gdb command: frame ”. Can you please let me know if there
> is any other way of accessing the kernel stack frames using crash.

Right -- the embedded gdb doesn't know anything about the core file
(or live system) that you're running on.  It's invoked as "gdb vmlinux",
and doesn't know anything about any "frames".

As Flavio mentioned, you can see the stack data of each frame with
"bt -f", or better yet, "bt -F" which may illuminate what the data
may be, because it shows symbolic translations or slab cache names
instead of raw values where appropriate.

> 2) When I run bt in crash, I get a stack trace. Another person from a
> different team reported a slightly different stack trace to mine.
> Below are the stack traces. The register contents are quite different
> between the two
> 
> My stack trace
> 
> PID: 13366 TASK: ffff88031b60d580 CPU: 1 COMMAND: "telnet"
> 
> #0 [ffff88031ce759d0] machine_kexec at ffffffff81024486
> #1 [ffff88031ce75a40] crash_kexec at ffffffff8107e230
> #2 [ffff88031ce75b20] oops_end at ffffffff8100fa38
> #3 [ffff88031ce75b50] no_context at ffffffff8102d801
> #4 [ffff88031ce75ba0] __bad_area_nosemaphore at ffffffff8102d9c9
> #5 [ffff88031ce75c70] bad_area at ffffffff8102da41
> #6 [ffff88031ce75ca0] do_page_fault at ffffffff8102dd19
> #7 [ffff88031ce75cf0] page_fault at ffffffff812d7425
> #8 [ffff88031ce75d78] n_tty_read at ffffffff811f03b3
> #9 [ffff88031ce75ec0] tty_read at ffffffff811ebf7e
> #10 [ffff88031ce75f10] vfs_read at ffffffff810ebcc8
> #11 [ffff88031ce75f40] sys_read at ffffffff810ebe48
> #12 [ffff88031ce75f80] system_call_fastpath at ffffffff8100bbc2
> RIP: 00007ffff716b9e0 RSP: 00007fffffffdfc0 RFLAGS: 00010212
> RAX: 0000000000000000 RBX: ffffffff8100bbc2 RCX: 0000000000000000
> RDX: 0000000000001ff6 RSI: 000000000061c02a RDI: 0000000000000000
> RBP: 0000000000001ff6 R8: 0000000000000000 R9: 0000000000000000
> R10: 0000000000616680 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000001 R14: 000000000061c02a R15: 00000000006178a0
> ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
> 
> 
> Reported stack trace
> 
> PID: 13366 TASK: ffff88031b60d580 CPU: 1 COMMAND: "telnet"
> #0 [ffff88031ce759d0] machine_kexec at ffffffff81024486
> #1 [ffff88031ce75a40] crash_kexec at ffffffff8107e230
> #2 [ffff88031ce75ad8] n_tty_read at ffffffff811f03b3
> #3 [ffff88031ce75b20] oops_end at ffffffff8100fa38
> #4 [ffff88031ce75b50] no_context at ffffffff8102d801
> #5 [ffff88031ce75ba0] __bad_area_nosemaphore at ffffffff8102d9c9
> #6 [ffff88031ce75c20] native_sched_clock at ffffffff810120aa
> #7 [ffff88031ce75c70] bad_area at ffffffff8102da41
> #8 [ffff88031ce75ca0] do_page_fault at ffffffff8102dd19
> #9 [ffff88031ce75cf0] page_fault at ffffffff812d7425
> [exception RIP: n_tty_read+1420]
> RIP: ffffffff811f03b3 RSP: ffff88031ce75da8 RFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff8802cbd54a68 RCX: 000000000061c044
> RDX: 0000000000000005 RSI: ffff88031ce75e87 RDI: ffff8802cbd54d1c
> RBP: ffff88031ce75eb8 R8: 0000000000000000 R9: 0000000000000000
> R10: 0000000000616680 R11: 0000000000000246 R12: 000000000061c044
> R13: ffff8802cbd54800 R14: 0000000000000000 R15: 7fffffffffffffff
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff88031ce75ec0]
> #10 [ffff88031ce75ec0] tty_read at ffffffff811ebf7e
> #11 [ffff88031ce75f10] vfs_read at ffffffff810ebcc8
> #12 [ffff88031ce75f40] sys_read at ffffffff810ebe48
> #13 [ffff88031ce75f80] system_call_fastpath at ffffffff8100bbc2

The first backtrace is different because you are apparently using an
older version of the crash utility, because it is not showing the 
page fault exception frame like the "reported" version.

> 
> 3) I want to retrieve the address of a data structure in the current
> context. How can it be done? I tried using struct command, but it did
> not help

The struct command needs the correct virtual address of the structure
you're trying to view.  So I presume you're asking how to find the address 
of the data structure?  If that's true, you're going to have to be a lot
more specific.

> 4) When I run the command readelf -a vmcore, I get an error message
> ”readelf: Error: Not an ELF file - it has the wrong magic bytes at the
> start.”

I presume that the dumpfile is a compressed kdump dumpfile generated
by makedumpfile, which takes the original /proc/vmcore ELF dumpfile
and creates its own unique dumpfile format.

Dave