[Crash-utility] Re:[RFC] Crash patch for DWARF CFI based unwind support

Tue Oct 31 22:21:39 UTC 2006

Hi Rachita,

I've figured out why the x86_64 interrupt-stack-to-process-stack
transition is showing a bogus exception frame.  It's not kdump
or jprobes -- I think it may have been introduced with the DWARF
CFI changes.

Anyway, in older x86_64 kernels, when an interrupt was taken,
the pt_regs exception frame would be laid down on the current stack,
and the rdi register would contain a pointer to it.  Then the stack
pointer would be switched to the per-cpu interrupt stack.  (Actually
it is switched to a point 64 bytes from the top of the interrupt
stack, presumably for cache line purposes).  The first thing
done after having been switched to the interrupt stack is to push
the rdi register, which again, contains a pointer to the exception
frame on the other stack.  Then it calls the interrupt handler.

Here's the "old" code, where the last 4 instructions in the macro
shown below perform the steps outlined above:

1. get the per-cpu interrupt stack address,
2. move it into rsp -- which effectively switches stacks,
3. then the rdi register is pushed,
4. and the interrupt handler called:

        .macro interrupt func
        CFI_STARTPROC   simple
        CFI_DEF_CFA     rsp,(SS-RDI)
        CFI_REL_OFFSET  rsp,(RSP-ORIG_RAX)
        CFI_REL_OFFSET  rip,(RIP-ORIG_RAX)
        cld
#ifdef CONFIG_DEBUG_INFO
        SAVE_ALL
        movq %rsp,%rdi
        /*
         * Setup a stack frame pointer.  This allows gdb to trace
         * back to the original stack.
         */
        movq %rsp,%rbp
        CFI_DEF_CFA_REGISTER    rbp
#else
        SAVE_ARGS
        leaq -ARGOFFSET(%rsp),%rdi      # arg1 for handler
#endif
        testl $3,CS(%rdi)
        je 1f
        swapgs
1:      addl $1,%gs:pda_irqcount        # RED-PEN should check preempt count
        movq %gs:pda_irqstackptr,%rax
        cmoveq %rax,%rsp
        pushq %rdi                      # save old stack
        call \func
        .endm

However, in current x86_64 kernels, the interrupt macro has changed
to look like this:

        .macro interrupt func
        cld
        SAVE_ARGS
        leaq -ARGOFFSET(%rsp),%rdi      # arg1 for handler
        pushq %rbp
        CFI_ADJUST_CFA_OFFSET   8
        CFI_REL_OFFSET          rbp, 0
        movq %rsp,%rbp
        CFI_DEF_CFA_REGISTER    rbp
        testl $3,CS(%rdi)
        je 1f
        swapgs
1:      incl    %gs:pda_irqcount        # RED-PEN should check preempt count
        cmoveq %gs:pda_irqstackptr,%rsp
        push    %rbp                    # backlink for old unwinder
        /*
         * We entered an interrupt context - irqs are off:
         */
        TRACE_IRQS_OFF
        call \func
        .endm

Note that rdi still contains the pt_regs pointer, as evidenced by
the "testl $3,CS(%rdi)" instruction, which is checking the CS register
contents in the pt_regs for whether it was operating in user-space
when the interrupt occurred.  But more importantly, note that just
prior to calling the handler, it does a "push %rbp" instead of a
"pushq %rdi" like it used to.

I'm pretty sure it's being done purposely, because instead of the
having "old unwinder" dumping kernel text addresses starting inside
of the pt_regs exception frame, it bumps the starting point up to
whatever's contained in $rbp, which is above the exception frame
on the old stack.  So it would avoid dumping text return addresses
that happen to be sitting in the pt_regs register dump.

Just to verify, I patched the current kernel to push rdi instead
of rbp.  Again, here's what the unpatched alt-sysrq-c backtrace
looks like:

crash> bt
PID: 0      TASK: ffff81003fe48100  CPU: 1   COMMAND: "swapper"
 #0 [ffff81003fe6bb40] crash_kexec at ffffffff800ab798
 #1 [ffff81003fe6bbc8] mwait_idle at ffffffff80055375
 #2 [ffff81003fe6bc00] sysrq_handle_crashdump at ffffffff80192fdc
 #3 [ffff81003fe6bc10] __handle_sysrq at ffffffff80192dae
 #4 [ffff81003fe6bc50] kbd_event at ffffffff8018db52
 #5 [ffff81003fe6bca0] input_event at ffffffff801e9b6d
 #6 [ffff81003fe6bcd0] hidinput_hid_event at ffffffff801e4299
 #7 [ffff81003fe6bd00] hid_process_event at ffffffff801df639
 #8 [ffff81003fe6bd40] hid_input_report at ffffffff801df9a7
 #9 [ffff81003fe6bdc0] hid_irq_in at ffffffff801e0d8e
#10 [ffff81003fe6bde0] usb_hcd_giveback_urb at ffffffff801d33a2
#11 [ffff81003fe6be10] uhci_giveback_urb at ffffffff8817b724
#12 [ffff81003fe6be50] uhci_scan_schedule at ffffffff8817be07
#13 [ffff81003fe6bed0] uhci_irq at ffffffff8817dc08
#14 [ffff81003fe6bf10] usb_hcd_irq at ffffffff801d3d91
#15 [ffff81003fe6bf20] handle_IRQ_event at ffffffff800106fd
#16 [ffff81003fe6bf50] __do_IRQ at ffffffff800b520c
#17 [ffff81003fe6bf58] __do_softirq at ffffffff80011bfa
#18 [ffff81003fe6bf90] do_IRQ at ffffffff8006a729
--- <IRQ stack> ---
#19 [ffff81003fe65e70] ret_from_intr at ffffffff8005ba89
    [exception RIP: cpu_idle+149]
    RIP: ffffffff800473a7  RSP: ffffffff8042e220  RFLAGS: ffffffff80074153
    RAX: ffffffffffffff16  RBX: 0000000000000000  RCX: ffffffff80055375
    RDX: 0000000000000010  RSI: 0000000000000246  RDI: ffff81003fe65ef0
    RBP: ffff81003fe64000   R8: ffffffff8034e818   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000000  R12: 000000000000003f
    R13: ffff810037d0c008  R14: 0000000000000246  R15: 0000000000000001
    ORIG_RAX: 0000000000000018  CS: 0020  SS: 0000
bt: WARNING: possibly bogus exception frame
crash>

And when the kernel is patched to push rdi instead, the
"old" behavior is emulated:

crash> bt
PID: 0      TASK: ffffffff8034ce60  CPU: 0   COMMAND: "swapper"
 #0 [ffffffff8047eb40] crash_kexec at ffffffff800ab798
 #1 [ffffffff8047ebc8] mwait_idle at ffffffff80055375
 #2 [ffffffff8047ec00] sysrq_handle_crashdump at ffffffff80192fdc
 #3 [ffffffff8047ec10] __handle_sysrq at ffffffff80192dae
 #4 [ffffffff8047ec50] kbd_event at ffffffff8018db52
 #5 [ffffffff8047eca0] input_event at ffffffff801e9b6d
 #6 [ffffffff8047ecd0] hidinput_hid_event at ffffffff801e4299
 #7 [ffffffff8047ecd8] ip_route_input at ffffffff8003662f
 #8 [ffffffff8047ed00] hid_process_event at ffffffff801df639
 #9 [ffffffff8047ed40] hid_input_report at ffffffff801df9a7
#10 [ffffffff8047edc0] hid_irq_in at ffffffff801e0d8e
#11 [ffffffff8047ede0] usb_hcd_giveback_urb at ffffffff801d33a2
#12 [ffffffff8047ee10] uhci_giveback_urb at ffffffff88126724
#13 [ffffffff8047ee50] uhci_scan_schedule at ffffffff88126e07
#14 [ffffffff8047eed0] uhci_irq at ffffffff88128c08
#15 [ffffffff8047ef10] usb_hcd_irq at ffffffff801d3d91
#16 [ffffffff8047ef20] handle_IRQ_event at ffffffff800106fd
#17 [ffffffff8047ef50] __do_IRQ at ffffffff800b520c
#18 [ffffffff8047ef58] __do_softirq at ffffffff80011bfa
#19 [ffffffff8047ef90] do_IRQ at ffffffff8006a729
--- <IRQ stack> ---
#20 [ffffffff80437ee8] ret_from_intr at ffffffff8005ba89
    [exception RIP: mwait_idle+54]
    RIP: ffffffff80055375  RSP: ffffffff80437f90  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 0000000000099000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: ffffffff8034e818
    RBP: 0000000000099000   R8: ffffffff80436000   R9: 000000000000003e
    R10: ffff810037d0c038  R11: ffff81003f48e580  R12: ffff810037fef7a0
    R13: 0000000000000000  R14: ffffffff8034d050  R15: 0000000002246128
    ORIG_RAX: ffffffffffffff16  CS: 0010  SS: 0018
#21 [ffffffff80437f90] cpu_idle at ffffffff800473a7
crash>

Anyway, we'll have to come up with a differentiator so that
both types of interrupt-stack-linkages are handled.  It looks
like the rbp value is fixed with relationship to the exception
frame, so something can be done.

Just FYI,
  Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20061031/1a676a44/attachment.htm>