[Crash-utility] Re:[RFC] Crash patch for DWARF CFI based unwind support

Mon Oct 23 18:34:32 UTC 2006

Rachita Kothiyal wrote:

> On Mon, Oct 23, 2006 at 09:31:11AM -0400, Dave Anderson wrote:
> >
> > Hmmm, yeah, good catch...
> >
> > But what happens the second time around, anyway?  Are the RSP/RIP
> > starting points so different such that the low_budget tracer's output
> > is so drastically different?  Or does it go off into the weeds because
> > the other user_regs_struct register offsets (that don't get initialized)
> > cause an OFFSET() failure?
> >
>
> Hi Dave
>
> This is what I get when I try it out on one my dumps:
>
> crash> bt      // ----------------------------------------------> 1
> PID: 4102   TASK: ffff81022e94d1e0  CPU: 0   COMMAND: "bash"
>  #0 [ffff810223d73d78] crash_kexec at ffffffff801521d1
>  #1 [ffff810223d73dc0] machine_kexec at ffffffff8011a739
>  #2 [ffff810223d73e00] crash_kexec at ffffffff801521ed
>  #3 [ffff810223d73e88] crash_kexec at ffffffff801521d1
>  #4 [ffff810223d73eb8] __sysrq_get_key_op at ffffffff80288b29
>  #5 [ffff810223d73ec0] __handle_sysrq at ffffffff80288d37
>  #6 [ffff810223d73f00] write_sysrq_trigger at ffffffff801adf95
>  #7 [ffff810223d73f10] vfs_write at ffffffff80179bf0
>  #8 [ffff810223d73f40] sys_write at ffffffff8017a187
>  #9 [ffff810223d73f80] system_call at ffffffff801096da
>     RIP: 00002b3979ef3900  RSP: 00007fff3122f360  RFLAGS: 00010287
>     RAX: 0000000000000001  RBX: ffffffff801096da  RCX: 00000000fbad2a84
>     RDX: 0000000000000002  RSI: 00002b397a181000  RDI: 0000000000000001
>     RBP: 0000000000000002   R8: 00000000ffffffff   R9: 00002b397a072ae0
>     R10: 0000000000000000  R11: 0000000000000246  R12: 00002b397a06c780
>     R13: 00002b397a181000  R14: 0000000000000002  R15: 0000000000000000
>     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
> crash> set unwind on // ------------------------------------------> 2
> unwind: on
> crash> bt
> PID: 4102   TASK: ffff81022e94d1e0  CPU: 0   COMMAND: "bash"
>  #0 [ffff810223d73e08] crash_kexec at ffffffff801521d1
>  #1 [ffff810223d73ec8] __handle_sysrq at ffffffff80288d37
>  #2 [ffff810223d73f08] write_sysrq_trigger at ffffffff801adf95
>  #3 [ffff810223d73f18] vfs_write at ffffffff80179bf0
>  #4 [ffff810223d73f48] sys_write at ffffffff8017a187
>  #5 [ffff810223d73f88] system_call at ffffffff801096da
>     RIP: 00002b3979ef3900  RSP: 00007fff3122f360  RFLAGS: 00010287
>     RAX: 0000000000000001  RBX: ffffffff801096da  RCX: 00000000fbad2a84
>     RDX: 0000000000000002  RSI: 00002b397a181000  RDI: 0000000000000001
>     RBP: 0000000000000002   R8: 00000000ffffffff   R9: 00002b397a072ae0
>     R10: 0000000000000000  R11: 0000000000000246  R12: 00002b397a06c780
>     R13: 00002b397a181000  R14: 0000000000000002  R15: 0000000000000000
>     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
> crash> set unwind off //-------------------------------------> 3
> unwind: off
> crash> bt
> PID: 4102   TASK: ffff81022e94d1e0  CPU: 0   COMMAND: "bash"
>  #0 [ffff810223d73e88] crash_kexec at ffffffff801521d1
>  #1 [ffff810223d73eb8] __sysrq_get_key_op at ffffffff80288b29
>  #2 [ffff810223d73ec0] __handle_sysrq at ffffffff80288d37
>  #3 [ffff810223d73f00] write_sysrq_trigger at ffffffff801adf95
>  #4 [ffff810223d73f10] vfs_write at ffffffff80179bf0
>  #5 [ffff810223d73f40] sys_write at ffffffff8017a187
>  #6 [ffff810223d73f80] system_call at ffffffff801096da
>     RIP: 00002b3979ef3900  RSP: 00007fff3122f360  RFLAGS: 00010287
>     RAX: 0000000000000001  RBX: ffffffff801096da  RCX: 00000000fbad2a84
>     RDX: 0000000000000002  RSI: 00002b397a181000  RDI: 0000000000000001
>     RBP: 0000000000000002   R8: 00000000ffffffff   R9: 00002b397a072ae0
>     R10: 0000000000000000  R11: 0000000000000246  R12: 00002b397a06c780
>     R13: 00002b397a181000  R14: 0000000000000002  R15: 0000000000000000
>     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
>
> The register values which the last bt starts working with(marked 3 above),
> are rsp=ffff810223d73e08 and rip=ffffffff801521d1 (from NT_PRSTATUS).
> So from that point in stack, output 3 and 1 are same.
>
> We also see that stack addresses in 2 and 3 are off by '0x8'.
>  eg #1 ffff810223d73ec8 ------------> 2
>        ffff810223d73ec0 ------------> 3
>
> This is because what crash is reporting is the stack address at which
> the return address was pushed on stack, while what the dwarf based bt is
> reporting is the CFA. In most cases, return address is stored at a location
> (CFA - 8). That is why the offset of 0x8.
>
> The low-budget tracer's backtraces are different from the dwarf-tracer
> because when the low-budget tracer is unwinding the stack by trying to read
> kernel text addresses, it actually comes across many addresses which were
> actually not pushed onto stack because of function calls.
> Specially for the panic task on kdumps, where after 'crash_kexec' is called,
> the registers are dumped onto stack(for creating NT_PRSTATUS section), this
> becomes misleading for the low-budget tracer mechanism. Thats why we see
> multiple crash_kexec entries in the backtrace. Static inline functions can
> also aggrevate this problem.
>
> In other cases, stale frames on the stack can also mislead the low-budget
> tracer.
>
> AFAICT, user_regs_struct register offsets are not the culprits here.
>
> Thanks
> Rachita

So, in other words, if we hardwire the user_regs_struct so that
it uses the NT_PRSTATUS registers all the time, then we get
the second (preferred/better) budget back trace when unwind
is off.

That being the case, I argue for hardwiring them all the time.

Dave