<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<tt></tt>
<br><tt>Hi Rachita,</tt><tt></tt>
<p><tt>I'm looking at an alt-sysrq-c generated crash on an x86_64 kernel,</tt>
<br><tt>and I am seeing something similar to what you are with respect</tt>
<br><tt>to the transition from the interrupt stack back to the process</tt>
<br><tt>stack.</tt><tt></tt>
<p><tt>Here's a "bt -a", where cpu 0 shows that it received a shutdown</tt>
<br><tt>NMI while in the idle loop, and the transition from the NMI</tt>
<br><tt>exception stack back to the process stack was clean. But
on</tt>
<br><tt>the cpu which took the alt-sysrq-c keyboard interrupt, the</tt>
<br><tt>transition from the per-cpu interrupt stack back to the</tt>
<br><tt>process stack is similar to what you're seeing:</tt><tt></tt>
<p><tt>crash> bt -a</tt>
<br><tt>PID: 0 TASK: ffffffff8034ce60
CPU: 0 COMMAND: "swapper"</tt>
<br><tt> #0 [ffffffff80481f30] crash_nmi_callback at ffffffff8007742f</tt>
<br><tt> #1 [ffffffff80481f40] do_nmi at ffffffff80063c2c</tt>
<br><tt> #2 [ffffffff80481f50] nmi at ffffffff8006312f</tt>
<br><tt> [exception RIP: mwait_idle+54]</tt>
<br><tt> RIP: ffffffff800553b7 RSP: ffffffff80437f90
RFLAGS: 00000246</tt>
<br><tt> RAX: 0000000000000000 RBX: ffffffff80055381
RCX: 0000000000000000</tt>
<br><tt> RDX: 0000000000000000 RSI: 0000000000000001
RDI: ffffffff8034e838</tt>
<br><tt> RBP: 0000000000099000 R8: ffffffff80436000
R9: 000000000000003e</tt>
<br><tt> R10: ffff810037d0c038 R11: 0000000000000048
R12: 0000000000090000</tt>
<br><tt> R13: 0000000000000000 R14: 0000000000000000
R15: 0000000000000000</tt>
<br><tt> ORIG_RAX: ffffffffffffffff CS: 0010
SS: 0018</tt>
<br><tt>--- <exception stack> ---</tt>
<br><tt> #3 [ffffffff80437f90] mwait_idle at ffffffff800553b7</tt>
<br><tt> #4 [ffffffff80437f90] cpu_idle at ffffffff800473be</tt><tt></tt>
<p><tt>PID: 0 TASK: ffff81003fe48100
CPU: 1 COMMAND: "swapper"</tt>
<br><tt> #0 [ffff81003fe6bb40] crash_kexec at ffffffff800ab7c4</tt>
<br><tt> #1 [ffff81003fe6bbc8] mwait_idle at ffffffff800553b7</tt>
<br><tt> #2 [ffff81003fe6bc00] sysrq_handle_crashdump at ffffffff8019301f</tt>
<br><tt> #3 [ffff81003fe6bc10] __handle_sysrq at ffffffff80192e1c</tt>
<br><tt> #4 [ffff81003fe6bc50] kbd_event at ffffffff8018dbc1</tt>
<br><tt> #5 [ffff81003fe6bca0] input_event at ffffffff801e9b9f</tt>
<br><tt> #6 [ffff81003fe6bcd0] hidinput_hid_event at ffffffff801e42cb</tt>
<br><tt> #7 [ffff81003fe6bd00] hid_process_event at ffffffff801df66b</tt>
<br><tt> #8 [ffff81003fe6bd40] hid_input_report at ffffffff801df9d9</tt>
<br><tt> #9 [ffff81003fe6bdc0] hid_irq_in at ffffffff801e0dc0</tt>
<br><tt>#10 [ffff81003fe6bde0] usb_hcd_giveback_urb at ffffffff801d33d8</tt>
<br><tt>#11 [ffff81003fe6be10] uhci_giveback_urb at ffffffff88168724</tt>
<br><tt>#12 [ffff81003fe6be50] uhci_scan_schedule at ffffffff88168e07</tt>
<br><tt>#13 [ffff81003fe6bed0] uhci_irq at ffffffff8816ac08</tt>
<br><tt>#14 [ffff81003fe6bf10] usb_hcd_irq at ffffffff801d3dc7</tt>
<br><tt>#15 [ffff81003fe6bf20] handle_IRQ_event at ffffffff80010704</tt>
<br><tt>#16 [ffff81003fe6bf50] __do_IRQ at ffffffff800b5238</tt>
<br><tt>#17 [ffff81003fe6bf58] __do_softirq at ffffffff80011c0b</tt>
<br><tt>#18 [ffff81003fe6bf90] do_IRQ at ffffffff8006a762</tt>
<br><tt>--- <IRQ stack> ---</tt>
<br><tt>#19 [ffff81003fe65e70] ret_from_intr at ffffffff8005bac9</tt>
<br><tt> [exception RIP: cpu_idle+149]</tt>
<br><tt> RIP: ffffffff800473be RSP: ffffffff8042e220
RFLAGS: ffffffff80074188</tt>
<br><tt> RAX: ffffffffffffff16 RBX: 0000000000000000
RCX: ffffffff800553b7</tt>
<br><tt> RDX: 0000000000000010 RSI: 0000000000000246
RDI: ffff81003fe65ef0</tt>
<br><tt> RBP: ffff81003fe64000 R8: ffffffff8034e838
R9: 0000000000000001</tt>
<br><tt> R10: 0000000000000000 R11: 0000000000000000
R12: 000000000000003f</tt>
<br><tt> R13: ffff810037d0c008 R14: 0000000000000246
R15: 0000000000000001</tt>
<br><tt> ORIG_RAX: 0000000000000018 CS: 0020
SS: 0000</tt>
<br><tt>bt: WARNING: possibly bogus exception frame</tt>
<br><tt>crash></tt><tt></tt>
<p><tt>crash dutifully reports that the exception frame looks bogus</tt>
<br><tt>because of the CS value.</tt><tt></tt>
<p><tt>The end of the per-cpu interrupt stack looks like this:</tt><tt></tt>
<p><tt>crash> rd -s ffff81003fe68000 2048</tt>
<br><tt>...</tt>
<br><tt>ffff81003fe6bf30: 000000000000e900 00000000000000e9</tt>
<br><tt>ffff81003fe6bf40: ffff810037ca4bc0 irq_desc+59708</tt>
<br><tt>ffff81003fe6bf50: __do_IRQ+164 __do_softirq+94</tt>
<br><tt>ffff81003fe6bf60: 00000000000000e9 ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bf70: 00000000000000ff cpu_data+256</tt>
<br><tt>ffff81003fe6bf80: 0000000000000100 cpu_core_map+32</tt>
<br><tt>ffff81003fe6bf90: do_IRQ+231
ffff81003fe65e70</tt>
<br><tt>ffff81003fe6bfa0: mwait_idle
ffff81003fe65e70</tt>
<br><tt>ffff81003fe6bfb0: ret_from_intr ffff81003fe65e70</tt>
<br><tt>ffff81003fe6bfc0: 0000000000000000 0000000000000000</tt>
<br><tt>ffff81003fe6bfd0: 0000000000000000 0000000000000000</tt>
<br><tt>ffff81003fe6bfe0: 0000000000000000 0000000000000000</tt>
<br><tt>ffff81003fe6bff0: 0000000000000000 0000000000000000</tt>
<br><tt>crash></tt><tt></tt>
<p><tt>...hence the supposed pointer to the generating exception frame</tt>
<br><tt>is presumed to be ffff81003fe65e70 (which is bogus).</tt><tt></tt>
<p><tt>Interestingly, though, is if I do an exception frame search</tt>
<br><tt>for that task, I do find the "real" exception frame:</tt><tt></tt>
<p><tt>crash> bt -e</tt>
<br><tt>PID: 0 TASK: ffff81003fe48100
CPU: 1 COMMAND: "swapper"</tt><tt></tt>
<p><tt> KERNEL-MODE EXCEPTION FRAME AT: ffff81003fe65e48</tt>
<br><tt> RIP: ffffffff800553b7 RSP: ffff81003fe65ef0
RFLAGS: 00000246</tt>
<br><tt> RAX: 0000000000000000 RBX: 0000000000000001
RCX: 0000000000000000</tt>
<br><tt> RDX: 0000000000000000 RSI: 0000000000000001
RDI: ffffffff8034e838</tt>
<br><tt> RBP: 0000000000000000 R8: ffff81003fe64000
R9: 000000000000003f</tt>
<br><tt> R10: ffff810037d0c008 R11: 0000000000000246
R12: ffff810037fef040</tt>
<br><tt> R13: 0000000000000001 R14: ffff81003fe482f0
R15: 0000000002245f5e</tt>
<br><tt> ORIG_RAX: ffffffffffffff16 CS: 0010
SS: 0018</tt>
<br><tt>crash> sym ffffffff800553b7</tt>
<br><tt>ffffffff800553b7 (t) mwait_idle+0x36 include/asm/thread_info.h:
63</tt>
<br><tt>crash></tt><tt></tt>
<p><tt>and if I just grep for that address within the per-cpu interrupt</tt>
<br><tt>stack, I see several refernces to it:</tt><tt></tt>
<p><tt>crash> rd -s ffff81003fe68000 2048 | grep ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bb30: ffff81003ac7c000 ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bc60: 00ffffff8001c1fd ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bce0: ffff81003ec68000 ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bd50: ffff81003fe65e48 0000000180087720</tt>
<br><tt>ffff81003fe6bda0: ffff810037cb7400 ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bdb0: ffff810037cb7550 ffff81003fe65e48</tt>
<br><tt>ffff81003fe6be60: ffff81003cf37488 ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bec0: ffff81003fe65e48 ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bed0: uhci_irq+0x13f ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bf00: ffff81003fe65e48 ffff81003fe65e48</tt>
<br><tt>ffff81003fe6bf60: 00000000000000e9 ffff81003fe65e48</tt>
<br><tt>crash></tt><tt></tt>
<p><tt>But none of them are located at the "fixed" location of one</tt>
<br><tt>word below the 64-byte block at the top of the interrupt</tt>
<br><tt>stack.</tt><tt></tt>
<p><tt>So I don't know what's going on in this case...</tt><tt></tt>
<p><tt>I don't ever recall seeing such a bogus interrupt-to-process</tt>
<br><tt>stack transition on any netdump or diskdump generated vmcores.</tt>
<br><tt>And all the "test" kdump kernel dumpfiles I've used have only</tt>
<br><tt>been generated by using "echo c > /proc/sysrq-trigger", so the</tt>
<br><tt>crash path never went off the process stack.</tt><tt></tt>
<p><tt>So without blatantly pointing the finger, I wonder whether there's</tt>
<br><tt>something that the kexec/kdump code path does that could possibly</tt>
<br><tt>be tinkering with the contents of the interrupt stack?</tt><tt></tt>
<p><tt>I also want to try to force another crash but with all cpus</tt>
<br><tt>forcibly running something other than the idle task, in case</tt>
<br><tt>there's something strange about the interrupt bringing the</tt>
<br><tt>cpu out of that "mwait" instruction? Grasping at straws a</tt>
<br><tt>bit here...</tt><tt></tt>
<p><tt>Also, that's why I'm always asking for back-trace tests that</tt>
<br><tt>do something "real" -- instead of just having the kernel</tt>
<br><tt>call panic(), or that do a sys_write() to /proc/sysrq-trigger</tt>
<br><tt>to force an oops on the process stack. At least an alt-sysrq-c</tt>
<br><tt>on the console keyboard generates an interrupt, as does your</tt>
<br><tt>forced jprobes deal...</tt><tt></tt>
<p><tt>BTW, I also note the the reading of the module unwind tables</tt>
<br><tt>is reading invalid data, because it's being done before the</tt>
<br><tt>non-unity-mapped address translation can even work! In other</tt>
<br><tt>words, vmalloc addresses can only be read after vm_init() is</tt>
<br><tt>complete. So I've added another machdep_init() argument</tt>
<br><tt>(POST_VM) that is called just after vm_init(), and in the</tt>
<br><tt>case of x86_64 (and x86), can call init_unwind_table().</tt><tt></tt>
<p><tt>Thanks,</tt>
<br><tt> Dave</tt>
<br><tt></tt> </html>