[Crash-utility] Re:[RFC] Crash patch for DWARF CFI based unwind support

Mon Oct 23 18:02:10 UTC 2006

On Mon, Oct 23, 2006 at 09:31:11AM -0400, Dave Anderson wrote:
> 
> Hmmm, yeah, good catch...
> 
> But what happens the second time around, anyway?  Are the RSP/RIP
> starting points so different such that the low_budget tracer's output
> is so drastically different?  Or does it go off into the weeds because
> the other user_regs_struct register offsets (that don't get initialized)
> cause an OFFSET() failure?
>

Hi Dave

This is what I get when I try it out on one my dumps:

crash> bt      // ----------------------------------------------> 1
PID: 4102   TASK: ffff81022e94d1e0  CPU: 0   COMMAND: "bash"
 #0 [ffff810223d73d78] crash_kexec at ffffffff801521d1
 #1 [ffff810223d73dc0] machine_kexec at ffffffff8011a739
 #2 [ffff810223d73e00] crash_kexec at ffffffff801521ed
 #3 [ffff810223d73e88] crash_kexec at ffffffff801521d1
 #4 [ffff810223d73eb8] __sysrq_get_key_op at ffffffff80288b29
 #5 [ffff810223d73ec0] __handle_sysrq at ffffffff80288d37
 #6 [ffff810223d73f00] write_sysrq_trigger at ffffffff801adf95
 #7 [ffff810223d73f10] vfs_write at ffffffff80179bf0
 #8 [ffff810223d73f40] sys_write at ffffffff8017a187
 #9 [ffff810223d73f80] system_call at ffffffff801096da
    RIP: 00002b3979ef3900  RSP: 00007fff3122f360  RFLAGS: 00010287
    RAX: 0000000000000001  RBX: ffffffff801096da  RCX: 00000000fbad2a84
    RDX: 0000000000000002  RSI: 00002b397a181000  RDI: 0000000000000001
    RBP: 0000000000000002   R8: 00000000ffffffff   R9: 00002b397a072ae0
    R10: 0000000000000000  R11: 0000000000000246  R12: 00002b397a06c780
    R13: 00002b397a181000  R14: 0000000000000002  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> set unwind on // ------------------------------------------> 2
unwind: on
crash> bt
PID: 4102   TASK: ffff81022e94d1e0  CPU: 0   COMMAND: "bash"
 #0 [ffff810223d73e08] crash_kexec at ffffffff801521d1
 #1 [ffff810223d73ec8] __handle_sysrq at ffffffff80288d37
 #2 [ffff810223d73f08] write_sysrq_trigger at ffffffff801adf95
 #3 [ffff810223d73f18] vfs_write at ffffffff80179bf0
 #4 [ffff810223d73f48] sys_write at ffffffff8017a187
 #5 [ffff810223d73f88] system_call at ffffffff801096da
    RIP: 00002b3979ef3900  RSP: 00007fff3122f360  RFLAGS: 00010287
    RAX: 0000000000000001  RBX: ffffffff801096da  RCX: 00000000fbad2a84
    RDX: 0000000000000002  RSI: 00002b397a181000  RDI: 0000000000000001
    RBP: 0000000000000002   R8: 00000000ffffffff   R9: 00002b397a072ae0
    R10: 0000000000000000  R11: 0000000000000246  R12: 00002b397a06c780
    R13: 00002b397a181000  R14: 0000000000000002  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> set unwind off //-------------------------------------> 3
unwind: off
crash> bt
PID: 4102   TASK: ffff81022e94d1e0  CPU: 0   COMMAND: "bash"
 #0 [ffff810223d73e88] crash_kexec at ffffffff801521d1
 #1 [ffff810223d73eb8] __sysrq_get_key_op at ffffffff80288b29
 #2 [ffff810223d73ec0] __handle_sysrq at ffffffff80288d37
 #3 [ffff810223d73f00] write_sysrq_trigger at ffffffff801adf95
 #4 [ffff810223d73f10] vfs_write at ffffffff80179bf0
 #5 [ffff810223d73f40] sys_write at ffffffff8017a187
 #6 [ffff810223d73f80] system_call at ffffffff801096da
    RIP: 00002b3979ef3900  RSP: 00007fff3122f360  RFLAGS: 00010287
    RAX: 0000000000000001  RBX: ffffffff801096da  RCX: 00000000fbad2a84
    RDX: 0000000000000002  RSI: 00002b397a181000  RDI: 0000000000000001
    RBP: 0000000000000002   R8: 00000000ffffffff   R9: 00002b397a072ae0
    R10: 0000000000000000  R11: 0000000000000246  R12: 00002b397a06c780
    R13: 00002b397a181000  R14: 0000000000000002  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

The register values which the last bt starts working with(marked 3 above),
are rsp=ffff810223d73e08 and rip=ffffffff801521d1 (from NT_PRSTATUS).
So from that point in stack, output 3 and 1 are same.

We also see that stack addresses in 2 and 3 are off by '0x8'.
 eg #1 ffff810223d73ec8 ------------> 2
       ffff810223d73ec0 ------------> 3

This is because what crash is reporting is the stack address at which
the return address was pushed on stack, while what the dwarf based bt is 
reporting is the CFA. In most cases, return address is stored at a location
(CFA - 8). That is why the offset of 0x8.

The low-budget tracer's backtraces are different from the dwarf-tracer
because when the low-budget tracer is unwinding the stack by trying to read
kernel text addresses, it actually comes across many addresses which were
actually not pushed onto stack because of function calls.
Specially for the panic task on kdumps, where after 'crash_kexec' is called, 
the registers are dumped onto stack(for creating NT_PRSTATUS section), this
becomes misleading for the low-budget tracer mechanism. Thats why we see
multiple crash_kexec entries in the backtrace. Static inline functions can
also aggrevate this problem.

In other cases, stale frames on the stack can also mislead the low-budget
tracer. 

AFAICT, user_regs_struct register offsets are not the culprits here.

Thanks
Rachita