[Crash-utility] arm64: odd backtrace?

Dave Anderson anderson at redhat.com
Tue Jun 7 16:10:43 UTC 2016



Hello Takahiro,

I went ahead and checked in a fix for the user-space backtrace issue here:

  https://github.com/crash-utility/crash/commit/2d53b97a476e71bfd5e2054d64aacfc5fd895e30
  
  Fix for the ARM64 "bt" command in Linux 4.5 and later kernels which
  use per-cpu IRQ stacks.  Without the patch, if an active non-crashing
  task was running in user space when it received the shutdown IPI from
  the crashing task, the "-- <IRQ stack> ---" transition marker from
  the IRQ stack to the process stack is not displayed, and a message
  indicating "bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp:
  <address> fp: 0 (?)" gets displayed.
  (anderson at redhat.com)


The "phantom" exception frames in your 4.7 kernel vmcore are seen because 
your kernel doesn't have CONFIG_FUNCTION_GRAPH_TRACER configured, and 
therefore __in_irqentry_text() is a no-op:

  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
  static inline int __in_irqentry_text(unsigned long ptr)
  {
          extern char __irqentry_text_start[];
          extern char __irqentry_text_end[];
  
          return ptr >= (unsigned long)&__irqentry_text_start &&
                 ptr < (unsigned long)&__irqentry_text_end;
  }
  #else
  static inline int __in_irqentry_text(unsigned long ptr)
  {
          return 0;
  }
  #endif
  
  static inline int in_exception_text(unsigned long ptr)
  {
          extern char __exception_text_start[];
          extern char __exception_text_end[];
          int in;
  
          in = ptr >= (unsigned long)&__exception_text_start &&
               ptr < (unsigned long)&__exception_text_end;
  
          return in ? : __in_irqentry_text(ptr);
  }
  
In my Linux 4.5 kernel, CONFIG_FUNCTION_GRAPH_TRACER is configured,
and as a result, "gic_handle_irq" is outside of the range from 
__exception_text_start to __exception_text_end:
  
  crash> sym -l
  ... [ cut ] ...
  ffff800000090000 (T) __exception_text_start
  ffff800000090000 (T) _stext
  ffff800000090000 (T) do_undefinstr
  ffff8000000901d8 (T) do_debug_exception
  ffff800000090294 (T) do_mem_abort
  ffff800000090348 (T) do_sp_pc_abort
  ffff800000090428 (T) __exception_text_end
  ffff800000090428 (T) __irqentry_text_start
  ffff800000090428 (t) gic_handle_irq
  ffff8000000904e0 (t) gic_handle_irq
  ffff800000090670 (T) __entry_text_start
  ffff800000090670 (T) __irqentry_text_end
  ...
  
In your Linux 4.7 kernel, gic_handle_irq is located within
the range, and as a result, the phantom exception frame gets
dumped:
  
  crash> sym -l
  ... [ cut ] ...
  ffff000008081000 (T) __exception_text_start
  ffff000008081000 (T) _stext
  ffff000008081000 (T) do_undefinstr
  ffff000008081000 (t) efi_header_end
  ffff000008081248 (T) do_mem_abort
  ffff0000080812e8 (T) do_sp_pc_abort
  ffff0000080813c0 (T) do_debug_exception
  ffff000008081460 (t) sun4i_handle_irq
  ffff0000080814d0 (t) gic_handle_irq
  ffff000008081580 (t) gic_handle_irq
  ffff0000080816e0 (T) __exception_text_end
  ...
  
The crash utility's in_exception_frame() function is based upon an older
kernel's version before _irqentry_text_start and __irqentry_text_end existed.

So two things need to be fixed in the crash utility:

 (1) the __irqentry_text_start and __irqentry_text_end range must
     be checked by in_exception_text() if they exist, and
 (2) this IRQ stack kludge that was added to the kernel's dump_backtrace()
     function needs to be handled the same way in the crash utility:

               if (in_exception_text(where)) {
                        /*
                         * If we switched to the irq_stack before calling this
                         * exception handler, then the pt_regs will be on the
                         * task stack. The easiest way to tell is if the large
                         * pt_regs would overlap with the end of the irq_stack.
                         */
                        if (stack < irq_stack_ptr &&
                            (stack + sizeof(struct pt_regs)) > irq_stack_ptr)
                                stack = IRQ_STACK_TO_TASK_STACK(irq_stack_ptr);

                        dump_mem("", "Exception stack", stack,
                                 stack + sizeof(struct pt_regs), false);
                }

I'm working on a patch for the above as we speak.

Thanks,
  Dave


----- Original Message -----
> 
> 
>   
> ----- Original Message -----
> 
> > > But I'm not sure what happens when an arm64 IRQ exception occurs when
> > > the task is running in user space.  Does it lay an exception frame down
> > > on the
> > > process stack and then make the transition?  (and therefore the
> > > user-space frame
> > > above is legitimate?)  Or does the user-space frame get laid down
> > > directly on the
> > > IRQ stack?  Unfortunately I don't know enough about arm64 exception
> > > handling.
> > 
> > Since I reviewed this IRQ stack patch in LAK-ML, I will be able to help
> > you.
> > but I don't have enough time to explain in details this week.
> 
> That's good news, your help will be greatly appreciated.
>  
> > > In any case, the bt should display "-- <IRQ stack> ...", and then dump
> > > the user-to-kernel-space exception frame, wherever it lies, i.e., either
> > > on the
> > > normal process stack or (maybe?) on the IRQ stack.
> > > 
> > > Anyway, can you make the vmlinux/vmcore pair available for me to
> > > download?
> > > You can send the details to me offline.
> > 
> > I sent you a message which contains the link to those binaries.
> 
> Got them -- thanks!
> 
> Also, I was finally able to generate a vmcore on a RHEL7 4.5.0-based kernel,
> where the crash occurred on cpu 1, and other 7 cpus were running in user
> space.
> I do see the same problem w/respect to the IRQ-stack-to-user-space
> transition.
> 
> However, I do not have the "phantom" exception frame dumps on the IRQ
> stacks that your dumpfile displays on the 7 non-crashing cpus, regardless
> whether they came from kernel or user space.
> 
> Here is the output:
> 
>   crash> sys
>         KERNEL: ../vmlinux
>       DUMPFILE: ../vmcore  [PARTIAL DUMP]
>           CPUS: 8 [OFFLINE: 7]
>           DATE: Thu Jun  2 15:09:34 2016
>         UPTIME: 05:06:18
>   LOAD AVERAGE: 7.56, 3.49, 1.38
>          TASKS: 202
>       NODENAME: apm-mustang-ev3-07.khw.lab.eng.bos.redhat.com
>        RELEASE: 4.5.0-0.38.el7.aarch64
>        VERSION: #1 SMP Thu May 19 15:37:24 EDT 2016
>        MACHINE: aarch64  (unknown Mhz)
>         MEMORY: 16 GB
>          PANIC: "sysrq: SysRq : Trigger a crash"
>   crash> bt -a
>   PID: 2546   TASK: ffff8003d5ab9600  CPU: 0   COMMAND: "spin"
>    #0 [ffff8003ffe33d60] crash_save_cpu at ffff800000148444
>    #1 [ffff8003ffe33dc0] handle_IPI at ffff80000009c8d0
>    #2 [ffff8003ffe33f80] gic_handle_irq at ffff8000000904c8
>    #3 [ffff8003ffe33fd0] el0_irq_naked at ffff80000009180c
>   bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003d5b73ed0
>   fp: 0 (?)
>        PC: 00000000004005b0   LR: 0000ffff911b0c94   SP: 0000fffffee69ca0
>       X29: 0000fffffee69ca0  X28: 0000000000000000  X27: 0000000000000000
>       X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>       X23: 0000000000000000  X22: 0000000000000000  X21: 0000000000400450
>       X20: 0000000000000000  X19: 0000000000000000  X18: 0000fffffee69bb0
>       X17: 0000000000420000  X16: 0000ffff911b0ba4  X15: 00000000001815e7
>       X14: 0000ffff9136ffb8  X13: 000000000000000f  X12: 0000000000000090
>       X11: 0000000090000000  X10: 00000000ffffffff   X9: 0000000000000018
>        X8: 2f2f2f2f2f2f2f2f   X7: b0bca0bdbeb3ff91   X6: 0000000000000000
>        X5: da16a3a21e08b5bc   X4: 0000000000000000   X3: 00000000004005b0
>        X2: 0000fffffee69df8   X1: 0000fffffee69de8   X0: 0000000000000001
>       ORIG_X0: 0000ffff91310000  SYSCALLNO: ffffffffffffffff  PSTATE:
>       60000000
>   
>   PID: 2513   TASK: ffff8003d925d000  CPU: 1   COMMAND: "bash"
>    #0 [ffff8003dbf238d0] crash_kexec at ffff8000001486cc
>    #1 [ffff8003dbf23a20] die at ffff80000009731c
>    #2 [ffff8003dbf23a50] __do_kernel_fault at ffff8000000a7210
>    #3 [ffff8003dbf23a90] do_page_fault at ffff80000077b244
>    #4 [ffff8003dbf23ac0] do_mem_abort at ffff8000000902e8
>    #5 [ffff8003dbf23b30] el1_da at ffff800000091368
>        PC: ffff8000004970e4  [sysrq_handle_crash+36]
>        LR: ffff800000497c5c  [__handle_sysrq+296]
>        SP: ffff8003dbf23cf0  PSTATE: 60000145
>       X29: ffff8003dbf23cf0  X28: ffff8003dbf20000  X27: ffff800000792000
>       X26: 0000000000000040  X25: 000000000000011e  X24: 0000000000000007
>       X23: 0000000000000000  X22: ffff800000ce4000  X21: 0000000000000063
>       X20: ffff800000c50000  X19: ffff800000ce4888  X18: 0000000000000000
>       X17: 0000ffff7d780e20  X16: ffff800000237848  X15: 00192ea0bab15d05
>       X14: 0000000000000000  X13: 0000000000000000  X12: ffff800000c50000
>       X11: 0000000000000000  X10: 00000000000001d3   X9: 00000000000001d4
>        X8: ffff80000121ce10   X7: 0000000000008d88   X6: ffff8000012140b8
>        X5: 0000000000000000   X4: 0000000000000000   X3: 0000000000000000
>        X2: ffff8003ffe76448   X1: 0000000000000000   X0: 0000000000000001
>       ORIG_X0: 00000000000001d3  SYSCALLNO: 0
>    #6 [ffff8003dbf23d00] __handle_sysrq at ffff800000497c5c
>    #7 [ffff8003dbf23d10] write_sysrq_trigger at ffff8000004980d4
>    #8 [ffff8003dbf23d50] proc_reg_write at ffff80000029b934
>    #9 [ffff8003dbf23d70] __vfs_write at ffff800000235fd0
>   #10 [ffff8003dbf23db0] vfs_write at ffff800000236d54
>   #11 [ffff8003dbf23e40] sys_write at ffff80000023789c
>   #12 [ffff8003dbf23e90] __sys_trace_return at ffff800000091a8c
>        PC: 0000ffff7d7dbda8   LR: 0000ffff7d7835d4   SP: 0000fffff90fe1b0
>       X29: 0000fffff90fe1b0  X28: 0000000000000000  X27: 00000000004fb000
>       X26: 00000000004bb420  X25: 0000000000000001  X24: 00000000004f8000
>       X23: 0000000000000000  X22: 0000000000000002  X21: 0000ffff7d881168
>       X20: 0000ffff76e30000  X19: 0000000000000002  X18: 0000000000000000
>       X17: 0000ffff7d780e20  X16: 0000000000000000  X15: 00192ea0bab15d05
>       X14: 0000000000000000  X13: 0000000000000000  X12: 0000000000000001
>       X11: 000000001c1fc6a0  X10: 00000000004fd000   X9: 0000fffff90fe130
>        X8: 0000000000000040   X7: 0000000000000001   X6: 0000ffff7d759a98
>        X5: 0000000000000001   X4: 00000000fbad2a84   X3: 0000000000000000
>        X2: 0000000000000002   X1: 0000ffff76e30000   X0: 0000000000000001
>       ORIG_X0: 0000000000000001  SYSCALLNO: 40  PSTATE: 20000000
>   
>   PID: 2545   TASK: ffff8003d5901d00  CPU: 2   COMMAND: "spin"
>    #0 [ffff8003ffe93d60] crash_save_cpu at ffff800000148444
>    #1 [ffff8003ffe93dc0] handle_IPI at ffff80000009c8d0
>    #2 [ffff8003ffe93f80] gic_handle_irq at ffff8000000904c8
>    #3 [ffff8003ffe93fd0] el0_irq_naked at ffff80000009180c
>   bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003db4f3ed0
>   fp: 0 (?)
>        PC: 00000000004005b0   LR: 0000ffffb50f0c94   SP: 0000ffffe48b4910
>       X29: 0000ffffe48b4910  X28: 0000000000000000  X27: 0000000000000000
>       X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>       X23: 0000000000000000  X22: 0000000000000000  X21: 0000000000400450
>       X20: 0000000000000000  X19: 0000000000000000  X18: 0000ffffe48b4820
>       X17: 0000000000420000  X16: 0000ffffb50f0ba4  X15: 00000000001815e7
>       X14: 0000ffffb52affb8  X13: 000000000000000f  X12: 0000000000000090
>       X11: 0000000090000000  X10: 00000000ffffffff   X9: 0000000000000018
>        X8: 2f2f2f2f2f2f2f2f   X7: b0bca0bdbeb3ff91   X6: 0000000000000000
>        X5: 46c7b691c219cb7a   X4: 0000000000000000   X3: 00000000004005b0
>        X2: 0000ffffe48b4a68   X1: 0000ffffe48b4a58   X0: 0000000000000001
>       ORIG_X0: 0000ffffb5250000  SYSCALLNO: ffffffffffffffff  PSTATE:
>       60000000
>   
>   PID: 2541   TASK: ffff8003d917b300  CPU: 3   COMMAND: "usex"
>    #0 [ffff8003ffec3d60] crash_save_cpu at ffff800000148444
>    #1 [ffff8003ffec3dc0] handle_IPI at ffff80000009c8d0
>    #2 [ffff8003ffec3f80] gic_handle_irq at ffff8000000904c8
>    #3 [ffff8003ffec3fd0] el0_irq_naked at ffff80000009180c
>   bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003dbecbed0
>   fp: 0 (?)
>        PC: 00000000004361e0   LR: 0000000000435be0   SP: 0000ffffcee64ac0
>       X29: 0000ffffcee64af0  X28: 0000000000000000  X27: 0000000000000000
>       X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>       X23: 0000000000000000  X22: 0000000000000000  X21: 00000000004037b0
>       X20: 0000ffff8c790000  X19: 0000000000062a44  X18: 0000ffffcee64980
>       X17: 0000ffff8c891b9c  X16: 00000000004602b8  X15: 002c4612d8986fa7
>       X14: 0000000000000000  X13: 00000003e8000000  X12: 0000000000000018
>       X11: 00000000000b5585  X10: 000000005750846e   X9: 00000000001ecba2
>        X8: 0000000000000099   X7: 0000000000000000   X6: 0000ffff8c8946ec
>        X5: 0000ffff8c894768   X4: 0000000000000032   X3: 0000000000000007
>        X2: 0000000000000007   X1: 0000000000000005   X0: 00000000004b21cc
>       ORIG_X0: 0000ffffcee64b18  SYSCALLNO: ffffffffffffffff  PSTATE:
>       80000000
>   
>   PID: 2544   TASK: ffff8003d9176a80  CPU: 4   COMMAND: "usex"
>    #0 [ffff8003ffef3d60] crash_save_cpu at ffff800000148444
>    #1 [ffff8003ffef3dc0] handle_IPI at ffff80000009c8d0
>    #2 [ffff8003ffef3f80] gic_handle_irq at ffff8000000904c8
>    #3 [ffff8003ffef3fd0] el0_irq_naked at ffff80000009180c
>   bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003dbea7ed0
>   fp: 0 (?)
>        PC: 0000000000435e38   LR: 0000000000435c94   SP: 0000ffffcee64af0
>       X29: 0000ffffcee64af0  X28: 0000000000000000  X27: 0000000000000000
>       X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>       X23: 0000000000000000  X22: 0000000000000000  X21: 00000000004037b0
>       X20: 0000ffff8c790000  X19: 0000000000041b82  X18: 0000ffffcee64980
>       X17: 0000ffff8c894590  X16: 0000000000460008  X15: 0034caab9974abe0
>       X14: 0000000000000000  X13: 00000003e8000000  X12: 0000000000000018
>       X11: 00000000000d83c1  X10: 000000005750846e   X9: 00000000001eccc0
>        X8: 0000000000000099   X7: 0000000000000000   X6: 0000ffff8c8946ec
>        X5: 0000ffff8c894768   X4: 000000000000474e   X3: 0000000000435ff8
>        X2: 0000000000000042   X1: 000000000000002a   X0: 0000ffffcee64b84
>       ORIG_X0: 0000ffffcee64b18  SYSCALLNO: ffffffffffffffff  PSTATE:
>       20000000
>   
>   PID: 2547   TASK: ffff8003d5906580  CPU: 5   COMMAND: "spin"
>    #0 [ffff8003fff23d60] crash_save_cpu at ffff800000148444
>    #1 [ffff8003fff23dc0] handle_IPI at ffff80000009c8d0
>    #2 [ffff8003fff23f80] gic_handle_irq at ffff8000000904c8
>    #3 [ffff8003fff23fd0] el0_irq_naked at ffff80000009180c
>   bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003db4efed0
>   fp: 0 (?)
>        PC: 00000000004005b0   LR: 0000ffffb33d0c94   SP: 0000ffffe5813a70
>       X29: 0000ffffe5813a70  X28: 0000000000000000  X27: 0000000000000000
>       X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>       X23: 0000000000000000  X22: 0000000000000000  X21: 0000000000400450
>       X20: 0000000000000000  X19: 0000000000000000  X18: 0000ffffe5813980
>       X17: 0000000000420000  X16: 0000ffffb33d0ba4  X15: 00000000001815e7
>       X14: 0000ffffb358ffb8  X13: 000000000000000f  X12: 0000000000000090
>       X11: 0000000090000000  X10: 00000000ffffffff   X9: 0000000000000018
>        X8: 2f2f2f2f2f2f2f2f   X7: b0bca0bdbeb3ff91   X6: 0000000000000000
>        X5: f72609a0900e9af5   X4: 0000000000000000   X3: 00000000004005b0
>        X2: 0000ffffe5813bc8   X1: 0000ffffe5813bb8   X0: 0000000000000001
>       ORIG_X0: 0000ffffb3530000  SYSCALLNO: ffffffffffffffff  PSTATE:
>       60000000
>   
>   PID: 2542   TASK: ffff8003d9178780  CPU: 6   COMMAND: "usex"
>    #0 [ffff8003fff53d60] crash_save_cpu at ffff800000148444
>    #1 [ffff8003fff53dc0] handle_IPI at ffff80000009c8d0
>    #2 [ffff8003fff53f80] gic_handle_irq at ffff8000000904c8
>    #3 [ffff8003fff53fd0] el0_irq_naked at ffff80000009180c
>   bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003dbebbed0
>   fp: 0 (?)
>        PC: 0000000000435e10   LR: 0000000000435ddc   SP: 0000ffffcee64ad0
>       X29: 0000ffffcee64ad0  X28: 0000000000000000  X27: 0000000000000000
>       X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>       X23: 0000000000000000  X22: 0000000000000000  X21: 00000000004037b0
>       X20: 0000ffff8c790000  X19: 0000000000054c83  X18: 0000ffffcee64980
>       X17: 0000ffff8c894590  X16: 0000000000460008  X15: 0033b1eeb687fcdc
>       X14: 0000000000000000  X13: 00000003e8000000  X12: 0000000000000018
>       X11: 00000000000d3be2  X10: 000000005750846e   X9: 00000000001ecc9a
>        X8: 0000000000000099   X7: 0000000000000000   X6: 0000ffff8c8946ec
>        X5: 0000ffff8c894768   X4: 000000000000474e   X3: 0000000000435ff8
>        X2: 000000003693b600   X1: 000000003693b5f0   X0: 0000000000000006
>       ORIG_X0: 0000ffffcee64b18  SYSCALLNO: ffffffffffffffff  PSTATE:
>       80000000
>   
>   PID: 2548   TASK: ffff8003d5ab4d80  CPU: 7   COMMAND: "spin"
>    #0 [ffff8003fff83d60] crash_save_cpu at ffff800000148444
>    #1 [ffff8003fff83dc0] handle_IPI at ffff80000009c8d0
>    #2 [ffff8003fff83f80] gic_handle_irq at ffff8000000904c8
>    #3 [ffff8003fff83fd0] el0_irq_naked at ffff80000009180c
>   bt: WARNING: arm64_unwind_frame: on IRQ stack: oriq_sp: ffff8003d5b63ed0
>   fp: 0 (?)
>        PC: 00000000004005b0   LR: 0000ffffae060c94   SP: 0000ffffcf219e20
>       X29: 0000ffffcf219e20  X28: 0000000000000000  X27: 0000000000000000
>       X26: 0000000000000000  X25: 0000000000000000  X24: 0000000000000000
>       X23: 0000000000000000  X22: 0000000000000000  X21: 0000000000400450
>       X20: 0000000000000000  X19: 0000000000000000  X18: 0000ffffcf219d30
>       X17: 0000000000420000  X16: 0000ffffae060ba4  X15: 00000000001815e7
>       X14: 0000ffffae21ffb8  X13: 000000000000000f  X12: 0000000000000090
>       X11: 0000000090000000  X10: 00000000ffffffff   X9: 0000000000000018
>        X8: 2f2f2f2f2f2f2f2f   X7: b0bca0bdbeb3ff91   X6: 0000000000000000
>        X5: aa704cb48aa4536a   X4: 0000000000000000   X3: 00000000004005b0
>        X2: 0000ffffcf219f78   X1: 0000ffffcf219f68   X0: 0000000000000001
>       ORIG_X0: 0000ffffae1c0000  SYSCALLNO: ffffffffffffffff  PSTATE:
>       60000000
>   crash>
> 
> Given that the link at the top of each of the IRQ stacks back to the
> kernel-entry-from-user-space exception frames look to be legitimate, perhaps
> the "fp: 0" could be used as a key to recognizing the IRQ-while-in-user-space
> scenario?  And also it doesn't appear that the phantom exception frames
> that are dumped in your vmcore are mistakenly generating the fp of 0.
> 
> Thanks,
>   Dave
> 
> 




More information about the Crash-utility mailing list