[Crash-utility] Interpreting bt

Thu Jan 24 20:37:30 UTC 2013

Thank you very much for the info, really helpful and very much
apprecaited.  I have a few follow on questions:

1. When the page fault occurs, is some of the registers (which might
contain parameters passed to the offending function) trampled on?  If yes,
is there a document or would you happen to know what registers (in the
worst case) are written to.

The reason I ask is in my dump below the register - RDI  (used to pass the
first param to ahaahh() ) should be zero (to have caused the page fault),
but it is not.

>From register dump after panic:
RBX: 0000000000000000     RDI: ffff88035daef4e0  (I expect this to be zero
per the dis-assembly code).

Reverse dis-assembly from RP when panic occurred:
crash> dis -r  ffffffffa06ce48f
0xffffffffa06ce460 <ahaann>:        push   %rbp
0xffffffffa06ce461 <ahaann+1>:      mov    %rsp,%rbp
0xffffffffa06ce464 <ahaann+4>:      push   %r12
0xffffffffa06ce466 <ahaann+6>:      push   %rbx
0xffffffffa06ce467 <ahaann+7>:      nopl   0x0(%rax,%rax,1)
0xffffffffa06ce46c <ahaann+12>:     mov    $0xffffffffa092c548,%rdx
0xffffffffa06ce473 <ahaann+19>:     movzwl %si,%ecx
0xffffffffa06ce476 <ahaann+22>:     mov    %rdi,%rbx         <==========
0xffffffffa06ce479 <ahaann+25>:     mov    %esi,%r12d
0xffffffffa06ce47c <ahaann+28>:     mov    $0xffffffffa092e5f0,%rdi
0xffffffffa06ce483 <ahaann+35>:     xor    %esi,%esi
0xffffffffa06ce485 <ahaann+37>:     callq  0xffffffffa06cd860 <ahahtl>
0xffffffffa06ce48a <ahaann+42>:     test   %rax,%rax
0xffffffffa06ce48d <ahaann+45>:     jne    0xffffffffa06ce500 <ahaann+160>
0xffffffffa06ce48f <ahaann+47>:     mov    (%rbx),%rdi       <==========

2. Does Linux (specifically crash) treat access to invalid address or NULL
ptr dereference the same way, as in calling them both page fault?  (In one
of my past work places, the crash dump was explicit is stating when a NULL
ptr dereference occurred, and I am wondering now if that was due to a
customization in crash).

3. Expanding on the meaning of the address in [] at the beginning of each
line of
the bt

[addr0]    function0    at  addr2
[addr1]    function1    at   addr2

addr1 - 8  :  starting address of the stack frame from function1  upto the
addr0.  I can use this info to peek into the values of function local
variables pushed onto the stack (specifically the function's stack frame).

Thank you,
Ahmed.

On Thu, Jan 24, 2013 at 7:34 AM, Dave Anderson <anderson at redhat.com> wrote:

>
>
> ----- Original Message -----
>
> > >
> > > I am using crash version: 6.0.4-2.el6 on CentOS 6.3 (kernel
> > > 2.6.32-279.el6.x86_64). I apologize for my newbie questions, but
> > > googling did not help much.
> > >
> > > When analyzing a kernel dump, I am getting the following bt.
> > >
> > > crash> bt
> > > PID: 12663 TASK: ffff88036304f500 CPU: 0 COMMAND: "bash"
> > > #0 [ffff88035b949570] machine_kexec at ffffffff8103281b
> > > #1 [ffff88035b9495d0] crash_kexec at ffffffff810ba662
> > > #2 [ffff88035b9496a0] oops_end at ffffffff81501290
> > > #3 [ffff88035b9496d0] no_context at ffffffff81043bab
> > > #4 [ffff88035b949720] __bad_area_nosemaphore at ffffffff81043e35
> > > #5 [ffff88035b949770] bad_area at ffffffff81043f5e
> > > #6 [ffff88035b9497a0] __do_page_fault at ffffffff81044710
> > > #7 [ffff88035b9498c0] do_page_fault at ffffffff8150326e
> > > #8 [ffff88035b9498f0] page_fault at ffffffff81500625
> > > [exception RIP: ahaann+47]
> > > RIP: ffffffffa06ce48f RSP: ffff88035b9499a8 RFLAGS: 00010246
> > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88035daef4e0
> > > RBP: ffff88035b9499b8 R8: 0000000004a47daf R9: ffffffffa06dae99
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000007
> > > R13: 00007fc82f4b8000 R14: 000000000000000a R15: 0000000000000000
> > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > > #9 [ffff88035b9499c0] ahaecho at ffffffffa06d2899 [ahadrv]
> > > #10 [ffff88035b949a00] writectl at ffffffffa06c366e [ahadrv]
> > > #11 [ffff88035b949e40] writeaha at ffffffffa06d3e7b [ahadrv]
> > > #12 [ffff88035b949e60] proc_file_write at ffffffff811e6e44
> > > #13 [ffff88035b949ea0] proc_reg_write at ffffffff811e0abe
> > > #14 [ffff88035b949ef0] vfs_write at ffffffff8117b068
> > > #15 [ffff88035b949f30] sys_write at ffffffff8117ba81
> > > #16 [ffff88035b949f80] system_call_fastpath at ffffffff8100b0f2
> > > RIP: 0000003a29ada3c0 RSP: 00007ffffaec6830 RFLAGS: 00010202
> > > RAX: 0000000000000001 RBX: ffffffff8100b0f2 RCX: 0000000000000065
> > > RDX: 000000000000000a RSI: 00007fc82f4b8000 RDI: 0000000000000001
> > > RBP: 00007fc82f4b8000 R8: 000000000000000a R9: 00007fc82f4aa700
> > > R10: 00000000fffffff7 R11: 0000000000000246 R12: 000000000000000a
> > > R13: 0000003a29d8c780 R14: 000000000000000a R15: 0000000001e18460
> > > ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
> > > crash>
> > >
> > >
> > > 1. Are the hex addr in [] right before the function name the stack
> > > frame ptr for that function?
> >
> > On x86_64 machines, the "at <address>" shown is the address in that
> frame's
> > function where the call instruction that it has made will return to.  So
> for
> > example, taking frame #15, where "sys_write at ffffffff8117ba81" has
> called
> > vfs_write(), you can disassemble all instructions from the beginning of
> > sys_write() to that address like this example:
> >
> >  crash> dis -r ffffffff80016e6b
> >  0xffffffff80016e26 <sys_write>: push   %r13
> >  0xffffffff80016e28 <sys_write+2>:       mov    %rsi,%r13
> >  0xffffffff80016e2b <sys_write+5>:       push   %r12
> >  0xffffffff80016e2d <sys_write+7>:       mov    $0xfffffffffffffff7,%r12
> >  0xffffffff80016e34 <sys_write+14>:      push   %rbp
> >  0xffffffff80016e35 <sys_write+15>:      mov    %rdx,%rbp
> >  0xffffffff80016e38 <sys_write+18>:      push   %rbx
> >  0xffffffff80016e39 <sys_write+19>:      sub    $0x18,%rsp
> >  0xffffffff80016e3d <sys_write+23>:      lea    0x14(%rsp),%rsi
> >  0xffffffff80016e42 <sys_write+28>:      callq  0xffffffff8000b5b4
> <fget_light>
> >  0xffffffff80016e47 <sys_write+33>:      test   %rax,%rax
> >  0xffffffff80016e4a <sys_write+36>:      mov    %rax,%rbx
> >  0xffffffff80016e4d <sys_write+39>:      je     0xffffffff80016e86
> <sys_write+96>
> >  0xffffffff80016e4f <sys_write+41>:      mov    0x38(%rax),%rax
> >  0xffffffff80016e53 <sys_write+45>:      lea    0x8(%rsp),%rcx
> >  0xffffffff80016e58 <sys_write+50>:      mov    %rbp,%rdx
> >  0xffffffff80016e5b <sys_write+53>:      mov    %r13,%rsi
> >  0xffffffff80016e5e <sys_write+56>:      mov    %rbx,%rdi
> >  0xffffffff80016e61 <sys_write+59>:      mov    %rax,0x8(%rsp)
> >  0xffffffff80016e66 <sys_write+64>:      callq  0xffffffff800164d0
> <vfs_write>
> >  0xffffffff80016e6b <sys_write+69>:      mov    %rax,%r12
> >  crash>
> >
> > And the stack address of the frame contains that return address location.
>
> Just to clarify -- the answer to your question is the that the
> address in the the [brackets] is the stack address that contains
> the return address location.
>
> > > 2. I am assuming the panic occurred in function ahaann() (and not in
> > > ahaecho() ). Is that right?
> >
> > That's correct.  The exception occurred precisely when executing the
> > instruction here: [exception RIP: ahadrv], which is at RIP
> > ffffffffa06ce48f.
>
> And to clarify the above -- where I made a cut-and-paste error -- I meant
> to state:
>
>   The exception occurred precisely when executing the instruction
>   here: [exception RIP: ahaann+47], which is at RIP ffffffffa06ce48f
>
> Sorry for any confusion...
>
> Dave
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20130124/f60dfd0b/attachment.htm>