[Crash-utility] Interpreting bt
Dave Anderson
anderson at redhat.com
Thu Jan 24 21:53:44 UTC 2013
----- Original Message -----
>
> Thank you very much for the info, really helpful and very much
> apprecaited. I have a few follow on questions:
>
> 1. When the page fault occurs, is some of the registers (which might
> contain parameters passed to the offending function) trampled on? If
> yes, is there a document or would you happen to know what registers
> (in the worst case) are written to.
>
> The reason I ask is in my dump below the register - RDI (used to pass
> the first param to ahaahh() ) should be zero (to have caused the
> page fault), but it is not.
RDI was originally passed into ahaann() as an argument, and the evidence
shows that it had a value of NULL. However, it was subsequently needed
as an argument register for the call to ahahtl() at ahaann+37. So before
being reused, RDX was copied/saved in RBX at ahaann+22. And then RDX was
overwritten/reused at ahaann+28:
>
> From register dump after panic:
> RBX: 0000000000000000 RDI: ffff88035daef4e0 (I expect this to be zero
> per the dis-assembly code).
>
> Reverse dis-assembly from RP when panic occurred:
> crash> dis -r ffffffffa06ce48f
> 0xffffffffa06ce460 <ahaann>: push %rbp
> 0xffffffffa06ce461 <ahaann+1>: mov %rsp,%rbp
> 0xffffffffa06ce464 <ahaann+4>: push %r12
> 0xffffffffa06ce466 <ahaann+6>: push %rbx
> 0xffffffffa06ce467 <ahaann+7>: nopl 0x0(%rax,%rax,1)
> 0xffffffffa06ce46c <ahaann+12>: mov $0xffffffffa092c548,%rdx
> 0xffffffffa06ce473 <ahaann+19>: movzwl %si,%ecx
> 0xffffffffa06ce476 <ahaann+22>: mov %rdi,%rbx <==========
> 0xffffffffa06ce479 <ahaann+25>: mov %esi,%r12d
> 0xffffffffa06ce47c <ahaann+28>: mov $0xffffffffa092e5f0,%rdi
> 0xffffffffa06ce483 <ahaann+35>: xor %esi,%esi
> 0xffffffffa06ce485 <ahaann+37>: callq 0xffffffffa06cd860 <ahahtl>
> 0xffffffffa06ce48a <ahaann+42>: test %rax,%rax
> 0xffffffffa06ce48d <ahaann+45>: jne 0xffffffffa06ce500 <ahaann+160>
> 0xffffffffa06ce48f <ahaann+47>: mov (%rbx),%rdi <==========
And so in your case, the page fault was caused by the NULL pointer
in RBX, which was originally passed into the function in RDI.
> 2. Does Linux (specifically crash) treat access to invalid address or
> NULL ptr dereference the same way, as in calling them both page
> fault? (In one of my past work places, the crash dump was explicit
> is stating when a NULL ptr dereference occurred, and I am wondering
> now if that was due to a customization in crash).
The crash utility doesn't have anything to do with it -- it simply
trying to resurrect what happened by what it sees left on the stack.
The kernel will transition to page_fault() on either a NULL pointer
or an invalid address (although sometimes an invalid address will
generate a general protection fault exception if certain bits are
set in the bad address).
If you do a "log" command, you will see a string that precedes the
final blurb containing the register dump and backtrace that will
also confirm what kind of exception occurred. Your's probably
says:
BUG: unable to handle kernel NULL pointer dereference at (null)
which gets generated here in the kernel's show_fault_oops() function:
printk(KERN_ALERT "BUG: unable to handle kernel ");
if (address < PAGE_SIZE)
printk(KERN_CONT "NULL pointer dereference");
else
printk(KERN_CONT "paging request");
printk(KERN_CONT " at %p\n", (void *) address);
printk(KERN_ALERT "IP:");
printk_address(regs->ip, 1);
>
>
> 3. Expanding on the meaning of the address in [] at the beginning of each line of the bt
>
>
> [addr0] function0 at addr2
> [addr1] function1 at addr2
>
> addr1 - 8 : starting address of the stack frame from function1 upto
> the addr0. I can use this info to peek into the values of function
> local variables pushed onto the stack (specifically the function's
> stack frame).
Exactly -- you can use "bt -f" or "bt -F" to do just that, where -f
just dumps the raw stack frame data, whereas -F also translates the
stack contents into known variable names/offsets, or into the slab cache
that it came from if either case is applicable.
For example:
crash> bt
...
#12 [ffff880037cb9ef0] vfs_write at ffffffff81172718
#13 [ffff880037cb9f30] sys_write at ffffffff81173151
...
crash> bt -f
...
#12 [ffff880037cb9ef0] vfs_write at ffffffff81172718
ffff880037cb9ef8: ffff880037cb9f78 ffffffff810d1b62
ffff880037cb9f08: ffff880078056260 ffff8800781248c0
ffff880037cb9f18: 00007f9b6f177000 0000000000000002
ffff880037cb9f28: ffff880037cb9f78 ffffffff81173151
#13 [ffff880037cb9f30] sys_write at ffffffff81173151
...
crash> bt -F
...
#12 [ffff880037cb9ef0] vfs_write at ffffffff81172718
ffff880037cb9ef8: ffff880037cb9f78 audit_syscall_entry+626
ffff880037cb9f08: [size-1024] [filp]
ffff880037cb9f18: 00007f9b6f177000 0000000000000002
ffff880037cb9f28: ffff880037cb9f78 sys_write+81
#13 [ffff880037cb9f30] sys_write at ffffffff81173151
...
Often times the [slab-cache] or symbol+offset references can
help pinpoint a local variable.
Dave
More information about the Crash-utility
mailing list