[Crash-utility] Re:[RFC] Crash patch for DWARF CFI based unwind support

Fri Oct 27 16:16:32 UTC 2006

Hi Rachita,

I'm looking at an alt-sysrq-c generated crash on an x86_64 kernel,
and I am seeing something similar to what you are with respect
to the transition from the interrupt stack back to the process
stack.

Here's a "bt -a", where cpu 0 shows that it received a shutdown
NMI while in the idle loop, and the transition from the NMI
exception stack back to the process stack was clean.  But on
the cpu which took the alt-sysrq-c keyboard interrupt, the
transition from the per-cpu interrupt stack back to the
process stack is similar to what you're seeing:

crash> bt -a
PID: 0      TASK: ffffffff8034ce60  CPU: 0   COMMAND: "swapper"
 #0 [ffffffff80481f30] crash_nmi_callback at ffffffff8007742f
 #1 [ffffffff80481f40] do_nmi at ffffffff80063c2c
 #2 [ffffffff80481f50] nmi at ffffffff8006312f
    [exception RIP: mwait_idle+54]
    RIP: ffffffff800553b7  RSP: ffffffff80437f90  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: ffffffff80055381  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: ffffffff8034e838
    RBP: 0000000000099000   R8: ffffffff80436000   R9: 000000000000003e
    R10: ffff810037d0c038  R11: 0000000000000048  R12: 0000000000090000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <exception stack> ---
 #3 [ffffffff80437f90] mwait_idle at ffffffff800553b7
 #4 [ffffffff80437f90] cpu_idle at ffffffff800473be

PID: 0      TASK: ffff81003fe48100  CPU: 1   COMMAND: "swapper"
 #0 [ffff81003fe6bb40] crash_kexec at ffffffff800ab7c4
 #1 [ffff81003fe6bbc8] mwait_idle at ffffffff800553b7
 #2 [ffff81003fe6bc00] sysrq_handle_crashdump at ffffffff8019301f
 #3 [ffff81003fe6bc10] __handle_sysrq at ffffffff80192e1c
 #4 [ffff81003fe6bc50] kbd_event at ffffffff8018dbc1
 #5 [ffff81003fe6bca0] input_event at ffffffff801e9b9f
 #6 [ffff81003fe6bcd0] hidinput_hid_event at ffffffff801e42cb
 #7 [ffff81003fe6bd00] hid_process_event at ffffffff801df66b
 #8 [ffff81003fe6bd40] hid_input_report at ffffffff801df9d9
 #9 [ffff81003fe6bdc0] hid_irq_in at ffffffff801e0dc0
#10 [ffff81003fe6bde0] usb_hcd_giveback_urb at ffffffff801d33d8
#11 [ffff81003fe6be10] uhci_giveback_urb at ffffffff88168724
#12 [ffff81003fe6be50] uhci_scan_schedule at ffffffff88168e07
#13 [ffff81003fe6bed0] uhci_irq at ffffffff8816ac08
#14 [ffff81003fe6bf10] usb_hcd_irq at ffffffff801d3dc7
#15 [ffff81003fe6bf20] handle_IRQ_event at ffffffff80010704
#16 [ffff81003fe6bf50] __do_IRQ at ffffffff800b5238
#17 [ffff81003fe6bf58] __do_softirq at ffffffff80011c0b
#18 [ffff81003fe6bf90] do_IRQ at ffffffff8006a762
--- <IRQ stack> ---
#19 [ffff81003fe65e70] ret_from_intr at ffffffff8005bac9
    [exception RIP: cpu_idle+149]
    RIP: ffffffff800473be  RSP: ffffffff8042e220  RFLAGS: ffffffff80074188
    RAX: ffffffffffffff16  RBX: 0000000000000000  RCX: ffffffff800553b7
    RDX: 0000000000000010  RSI: 0000000000000246  RDI: ffff81003fe65ef0
    RBP: ffff81003fe64000   R8: ffffffff8034e838   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000000  R12: 000000000000003f
    R13: ffff810037d0c008  R14: 0000000000000246  R15: 0000000000000001
    ORIG_RAX: 0000000000000018  CS: 0020  SS: 0000
bt: WARNING: possibly bogus exception frame
crash>

crash dutifully reports that the exception frame looks bogus
because of the CS value.

The end of the per-cpu interrupt stack looks like this:

crash> rd -s ffff81003fe68000 2048
...
ffff81003fe6bf30:  000000000000e900 00000000000000e9
ffff81003fe6bf40:  ffff810037ca4bc0 irq_desc+59708
ffff81003fe6bf50:  __do_IRQ+164     __do_softirq+94
ffff81003fe6bf60:  00000000000000e9 ffff81003fe65e48
ffff81003fe6bf70:  00000000000000ff cpu_data+256
ffff81003fe6bf80:  0000000000000100 cpu_core_map+32
ffff81003fe6bf90:  do_IRQ+231       ffff81003fe65e70
ffff81003fe6bfa0:  mwait_idle       ffff81003fe65e70
ffff81003fe6bfb0:  ret_from_intr    ffff81003fe65e70
ffff81003fe6bfc0:  0000000000000000 0000000000000000
ffff81003fe6bfd0:  0000000000000000 0000000000000000
ffff81003fe6bfe0:  0000000000000000 0000000000000000
ffff81003fe6bff0:  0000000000000000 0000000000000000
crash>

...hence the supposed pointer to the generating exception frame
is presumed to be ffff81003fe65e70 (which is bogus).

Interestingly, though, is if I do an exception frame search
for that task, I do find the "real" exception frame:

crash> bt -e
PID: 0      TASK: ffff81003fe48100  CPU: 1   COMMAND: "swapper"

  KERNEL-MODE EXCEPTION FRAME AT: ffff81003fe65e48
    RIP: ffffffff800553b7  RSP: ffff81003fe65ef0  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 0000000000000001  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: ffffffff8034e838
    RBP: 0000000000000000   R8: ffff81003fe64000   R9: 000000000000003f
    R10: ffff810037d0c008  R11: 0000000000000246  R12: ffff810037fef040
    R13: 0000000000000001  R14: ffff81003fe482f0  R15: 0000000002245f5e
    ORIG_RAX: ffffffffffffff16  CS: 0010  SS: 0018
crash> sym ffffffff800553b7
ffffffff800553b7 (t) mwait_idle+0x36  include/asm/thread_info.h: 63
crash>

and if I just grep for that address within the per-cpu interrupt
stack, I see several refernces to it:

crash> rd -s ffff81003fe68000 2048 | grep ffff81003fe65e48
ffff81003fe6bb30:  ffff81003ac7c000 ffff81003fe65e48
ffff81003fe6bc60:  00ffffff8001c1fd ffff81003fe65e48
ffff81003fe6bce0:  ffff81003ec68000 ffff81003fe65e48
ffff81003fe6bd50:  ffff81003fe65e48 0000000180087720
ffff81003fe6bda0:  ffff810037cb7400 ffff81003fe65e48
ffff81003fe6bdb0:  ffff810037cb7550 ffff81003fe65e48
ffff81003fe6be60:  ffff81003cf37488 ffff81003fe65e48
ffff81003fe6bec0:  ffff81003fe65e48 ffff81003fe65e48
ffff81003fe6bed0:  uhci_irq+0x13f   ffff81003fe65e48
ffff81003fe6bf00:  ffff81003fe65e48 ffff81003fe65e48
ffff81003fe6bf60:  00000000000000e9 ffff81003fe65e48
crash>

But none of them are located at the "fixed" location of one
word below the 64-byte block at the top of the interrupt
stack.

So I don't know what's going on in this case...

I don't ever recall seeing such a bogus interrupt-to-process
stack transition on any netdump or diskdump generated vmcores.
And all the "test" kdump kernel dumpfiles I've used have only
been generated by using "echo c > /proc/sysrq-trigger", so the
crash path never went off the process stack.

So without blatantly pointing the finger, I wonder whether there's
something that the kexec/kdump code path does that could possibly
be tinkering with the contents of the interrupt stack?

I also want to try to force another crash but with all cpus
forcibly running something other than the idle task, in case
there's something strange about the interrupt bringing the
cpu out of that "mwait" instruction?  Grasping at straws a
bit here...

Also, that's why I'm always asking for back-trace tests that
do something "real" -- instead of just having the kernel
call panic(), or that do a sys_write() to /proc/sysrq-trigger
to force an oops on the process stack.  At least an alt-sysrq-c
on the console keyboard generates an interrupt, as does your
forced jprobes deal...

BTW, I also note the the reading of the module unwind tables
is reading invalid data, because it's being done before the
non-unity-mapped address translation can even work!  In other
words, vmalloc addresses can only be read after vm_init() is
complete.  So I've added another machdep_init() argument
(POST_VM) that is called just after vm_init(), and in the
case of x86_64 (and x86), can call init_unwind_table().

Thanks,
  Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20061027/672279d9/attachment.htm>