[Crash-utility] arm64: odd backtrace?
AKASHI Takahiro
takahiro.akashi at linaro.org
Fri Jun 3 12:22:47 UTC 2016
On Thu, Jun 02, 2016 at 10:52:28AM -0400, Dave Anderson wrote:
>
> ----- Original Message -----
> > Dave,
> >
> > When I ran "bt" against a process running in a user mode, I got
> > an odd backtrace result:
> > ===8<===
> > crash> ps
> > ...
> > > 1324 1223 2 ffff80002018be80 RU 0.0 960 468 dhry
> > 1325 2 1 ffff800021089900 IN 0.0 0 0
> > [kworker/u16:0]
> > crash> bt 1324
> > PID: 1324 TASK: ffff80002018be80 CPU: 2 COMMAND: "dhry"
> > ffff800022f6ae08: ffff00000812ae44 (crash_save_cpu on IRQ stack)
> > #0 [ffff800022f6ae10] crash_save_cpu at ffff00000812ae44
> > #1 [ffff800022f6ae60] handle_IPI at ffff00000808e718
> > #2 [ffff800022f6b020] gic_handle_irq at ffff0000080815f8
> > #3 [ffff800022f6b050] el0_irq_naked at ffff000008084c4c
> > pt_regs: ffff800022f6af60
> > PC: ffffffffffffffff [unknown or invalid address]
> > LR: ffff800020107ed0 [unknown or invalid address]
> > SP: 0000000000000000 PSTATE: 004016a4
> > X29: ffff000008084c4c X28: ffff800022f6b080 X27: ffff000008e60c54
> > X26: ffff800020107ed0 X25: 0000000000001fff X24: 0000000000000003
> > X23: ffff0000080815f8 X22: ffff800022f6b040 X21: 0000000000000000
> > X20: ffff000008bce000 X19: ffff00000808e758 X18: ffff800022f6b010
> > X17: ffff00000808a820 X16: ffff800022f6aff0 X15: 0000000000000000
> > X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000402138
> > X11: ffff000008675850 X10: ffff800022f6afe0 X9: 0000000000000000
> > X8: ffff800022f6afc0 X7: 0000000000000000 X6: 0000000000000000
> > X5: 0000000000000000 X4: 0000000000000001 X3: 0000000000000000
> > X2: 0000000000493000 X1: 0000000000498000 X0: ffffffffffffffff
> > ORIG_X0: 0000000020000000 SYSCALLNO: 4021f0
> > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff800020107ed0 fp:
> > 0 (?)
> > pt_regs: ffff800020107ed0
> > PC: 00000000004016a4 LR: 00000000004016a4 SP: 0000ffffc10c40a0
> > X29: 0000ffffc10c40a0 X28: 0000000000000000 X27: 0000000000000000
> > X26: 0000000000000000 X25: 0000000000402138 X24: 00000000004021f0
> > X23: 0000000000000000 X22: 0000000000000000 X21: 00000000004001a0
> > X20: 0000000000000000 X19: 0000000000000000 X18: 0000000000000000
> > X17: 0000000000000001 X16: 0000000000000000 X15: 0000000000493000
> > X14: 0000000000498000 X13: ffffffffffffffff X12: 0000000000000005
> > X11: 000000000000001e X10: 0101010101010101 X9: fffffffff59a9190
> > X8: 7f7f7f7f7f7f7f7f X7: 1f535226301f2b4c X6: 00000003001d1000
> > X5: 00101d0003000000 X4: 0000000000000000 X3: 4952545320454d4f
> > X2: 0000000010c35b40 X1: 0000000000000011 X0: 0000000010c35b40
> > ORIG_X0: 0000000000498700 SYSCALLNO: ffffffffffffffff PSTATE: 20000000
> > ===>8===
> >
> > * PC, LR and SP look wrong.
> > I don't know how those pt_regs values were derived.
> > * The message, "WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp:
> > ffff800020107ed0 fp: 0 (?)" should be refined.
> > Apparently, in this case, the process is running in a user mode,
> > and so there is no normal kernel stack.
>
> Support for IRQ stacks was only recently put in place in crash-7.1.5,
> and obviously backtraces for a crash-while-in-user-space task is not working
> correctly. Unfortunately the only test kdump I have on hand only has IRQ
> stack transitions from kernel space. I tried to create a kdump from a system
> running user-space commands on our 4.5.0-based kernel, but as luck would
> have it, kdump fails to work. (it never even reaches the secondary kernel
> for some reason, even though the kdump facility says it's functional)
>
> Obviously there's a problem in arm64_unwind_frame() trying to make the transition,
> and it returns FALSE because of the NULL fp and therefore INSTACK(frame->fp, bt))
> fails. The function is trying to emulate the kernel's unwind_frame() function,
> which also would return -EINVAL because of the fp. But I'm not sure whether that
> fp value has been set correctly because of the first, seemingly bogus, exception
> frame that it's showing.
>
> As you have seen, kernel space exceptions look like this, where the fp, sp and pc
> values are legitimate, so it prints "-- <IRQ stack> --", and transitions to the
> exception frame on the process stack:
>
> crash> set debug 1
> debug: 1
> crash> bt
> PID: 0 TASK: fffffe035b0aae00 CPU: 3 COMMAND: "swapper/3"
> fffffe03fe183d58: fffffe0000137ee4 (crash_save_cpu on IRQ stack)
> #0 [fffffe03fe183d60] crash_save_cpu at fffffe0000137ee4
> #1 [fffffe03fe183dc0] handle_IPI at fffffe000008e8d4
> #2 [fffffe03fe183f80] gic_handle_irq at fffffe00000824c8
> #3 [fffffe03fe183fd0] el1_irq at fffffe0000083520
> bt: arm64_unwind_frame: switch stacks: fp: fffffe035b0f3f30 sp: fffffe035b0f3e10 pc: fffffe000008611c
> --- <IRQ stack> ---
> pt_regs: fffffe035b0f3e10
> PC: fffffe000008611c [arch_cpu_idle+60]
> LR: fffffe0000086118 [arch_cpu_idle+56]
> SP: fffffe035b0f3f30 PSTATE: 60000145
> X29: fffffe035b0f3f30 X28: 0000000000000000 X27: fffffe0000084170
> X26: fffffe0000bf13dc X25: fffffe0000cf4000 X24: fffffe035b0f0000
> X23: 0000000000000001 X22: fffffe0000b94c48 X21: 0000000000000003
> X20: fffffe0000cf6000 X19: fffffe0000cf6028 X18: 000002aabb090050
> X17: 000003ff9131a228 X16: fffffe000026dba4 X15: 00000000000000bf
> X14: 004894597490a924 X13: 0000000000000000 X12: 0000000000000010
> X11: 0000000000000067 X10: 0000000000000ab0 X9: fffffe035b0f0000
> X8: fffffe035b0ab910 X7: 0000000000007b17 X6: 000000000001c690
> X5: 0000001515d0302c X4: 0100000000000000 X3: fffffe03fe184c8c
> X2: fffffe03fe184c80 X1: 0000000000000000 X0: fffffe035b0f0000
> ORIG_X0: fffffe035b0f0000 SYSCALLNO: fffffe0000b94c48
> #4 [fffffe035b0f3e10] arch_cpu_idle at fffffe000008611c
> #5 [fffffe035b0f3f40] default_idle_call at fffffe00000f81cc
> #6 [fffffe035b0f3f70] cpu_startup_entry at fffffe00000f8320
> #7 [fffffe035b0f3f80] secondary_start_kernel at fffffe000008e338
> crash>
>
> In your sample, it certainly doesn't appear that the first exception frame found
> on the IRQ stack is legitimate, and probably should not pass the test in
> arm64_is_kernel_exception_frame(), but it does:
>
> > crash> bt 1324
> > PID: 1324 TASK: ffff80002018be80 CPU: 2 COMMAND: "dhry"
> > ffff800022f6ae08: ffff00000812ae44 (crash_save_cpu on IRQ stack)
> > #0 [ffff800022f6ae10] crash_save_cpu at ffff00000812ae44
> > #1 [ffff800022f6ae60] handle_IPI at ffff00000808e718
> > #2 [ffff800022f6b020] gic_handle_irq at ffff0000080815f8
> > #3 [ffff800022f6b050] el0_irq_naked at ffff000008084c4c
> > pt_regs: ffff800022f6af60
> > PC: ffffffffffffffff [unknown or invalid address]
> > LR: ffff800020107ed0 [unknown or invalid address]
> > SP: 0000000000000000 PSTATE: 004016a4
> > X29: ffff000008084c4c X28: ffff800022f6b080 X27: ffff000008e60c54
> > X26: ffff800020107ed0 X25: 0000000000001fff X24: 0000000000000003
> > X23: ffff0000080815f8 X22: ffff800022f6b040 X21: 0000000000000000
> > X20: ffff000008bce000 X19: ffff00000808e758 X18: ffff800022f6b010
> > X17: ffff00000808a820 X16: ffff800022f6aff0 X15: 0000000000000000
> > X14: 0000000000000000 X13: 0000000000000000 X12: 0000000000402138
> > X11: ffff000008675850 X10: ffff800022f6afe0 X9: 0000000000000000
> > X8: ffff800022f6afc0 X7: 0000000000000000 X6: 0000000000000000
> > X5: 0000000000000000 X4: 0000000000000001 X3: 0000000000000000
> > X2: 0000000000493000 X1: 0000000000498000 X0: ffffffffffffffff
> > ORIG_X0: 0000000020000000 SYSCALLNO: 4021f0
>
> Maybe that is the cause of the bogus "fp"? Anyway, since the orig_sp is
> from a fixed location at the top of the IRQ stack, It then manages to make its
> way back to the "dhry" process stack, where this exception frame "looks" legitimate:
>
> > bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff800020107ed0 fp: 0 (?)
> > pt_regs: ffff800020107ed0
> > PC: 00000000004016a4 LR: 00000000004016a4 SP: 0000ffffc10c40a0
> > X29: 0000ffffc10c40a0 X28: 0000000000000000 X27: 0000000000000000
> > X26: 0000000000000000 X25: 0000000000402138 X24: 00000000004021f0
> > X23: 0000000000000000 X22: 0000000000000000 X21: 00000000004001a0
> > X20: 0000000000000000 X19: 0000000000000000 X18: 0000000000000000
> > X17: 0000000000000001 X16: 0000000000000000 X15: 0000000000493000
> > X14: 0000000000498000 X13: ffffffffffffffff X12: 0000000000000005
> > X11: 000000000000001e X10: 0101010101010101 X9: fffffffff59a9190
> > X8: 7f7f7f7f7f7f7f7f X7: 1f535226301f2b4c X6: 00000003001d1000
> > X5: 00101d0003000000 X4: 0000000000000000 X3: 4952545320454d4f
> > X2: 0000000010c35b40 X1: 0000000000000011 X0: 0000000010c35b40
> > ORIG_X0: 0000000000498700 SYSCALLNO: ffffffffffffffff PSTATE: 20000000
>
> But I'm not sure what happens when an arm64 IRQ exception occurs when
> the task is running in user space. Does it lay an exception frame down on the
> process stack and then make the transition? (and therefore the user-space frame
> above is legitimate?) Or does the user-space frame get laid down directly on the
> IRQ stack? Unfortunately I don't know enough about arm64 exception handling.
Since I reviewed this IRQ stack patch in LAK-ML, I will be able to help you.
but I don't have enough time to explain in details this week.
> In any case, the bt should display "-- <IRQ stack> ...", and them dump
> the user-to-kernel-space exception frame, wherever it lies, i.e., either on the
> normal process stack or (maybe?) on the IRQ stack.
>
> Anyway, can you make the vmlinux/vmcore pair available for me to download? You can
> send the details to me offline.
I sent you a message which contains the link to those binaries.
Thanks,
-Takahiro AKASHI
> Thanks,
> Dave
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
--
Thanks,
-Takahiro AKASHI
More information about the Crash-utility
mailing list