[Crash-utility] [PATCH] ppc64: fix 'bt' command for vmcore captured with fadump.

Dave Anderson anderson at redhat.com
Wed Jan 18 20:35:31 UTC 2017



----- Original Message -----
> Without this patch, backtraces of active tasks maybe be of the form
> "#0 [c0000000700b3a90] (null) at c0000000700b3b50  (unreliable)" for
> kernel dumps captured with fadump.  Trying to use ptregs saved for
> active tasks before falling back to stack-search method. Also, getting
> rid of warnings like "‘is_hugepage’ declared inline after being called".
> 
> Signed-off-by: Hari Bathini <hbathini at linux.vnet.ibm.com>

Hari,

I only have 1 sample vmcore generated by FADUMP, and I see that
the backtraces of the non-panicking active tasks are an improvement 
given that they show the exception frame register set.  However, I also
note that the panic task backtrace has changed, from this using the
current method:

  PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
   #0 [c000000255933620] .crash_fadump at c00000000002cbb8
   #1 [c0000002559336c0] .die at c000000000030dc8
   #2 [c000000255933770] .bad_page_fault at c000000000043748
   #3 [c0000002559337f0] handle_page_fault at c000000000005228
   Data Access [300] exception frame:
   R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628   
   R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff   
   R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8   
   R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590   
   R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc   
   R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58   
   R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000   
   R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788   
   R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414   
   R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063   
   R30: c000000000ec9208    R31: c000000001423aac   
   NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
   CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
   CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
   #4 [c000000255933ae0] .sysrq_handle_crash at c00000000039c57c
   [Link Register] [c000000255933ae0] .__handle_sysrq at c00000000039c8c8
   #5 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
   #6 [c000000255933c30] .proc_reg_write at c000000000244874
   #7 [c000000255933ce0] .vfs_write at c0000000001c9dac
   #8 [c000000255933d80] .sys_write at c0000000001c9fd8
   #9 [c000000255933e30] syscall_exit at c000000000008564
   System Call [c00] exception frame:
   R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268   
   R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002   
   R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001   
   R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
   R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc   
   R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58   
   R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000   
   R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000   
   R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c   
   R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0   
   R30: 00000fffa55a0000    R31: 0000000000000002   
   NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
   CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
   CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
  
to this with your patch, where the exception backtrace is missing:

  PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
   R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628   
   R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff   
   R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8   
   R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590   
   R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc   
   R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58   
   R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000   
   R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788   
   R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414   
   R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063   
   R30: c000000000ec9208    R31: c000000001423aac   
   NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
   CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
   CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
   NIP [c00000000039c57c] .sysrq_handle_crash
   LR  [c00000000039c8c8] .__handle_sysrq
   #0 [c000000255933ae0] .__handle_sysrq at c00000000039c89c
   #1 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
   #2 [c000000255933c30] .proc_reg_write at c000000000244874
   #3 [c000000255933ce0] .vfs_write at c0000000001c9dac
   #4 [c000000255933d80] .sys_write at c0000000001c9fd8
   #5 [c000000255933e30] syscall_exit at c000000000008564
   System Call [c00] exception frame:
   R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268   
   R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002   
   R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001   
   R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
   R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc   
   R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58   
   R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000   
   R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000   
   R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c   
   R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0   
   R30: 00000fffa55a0000    R31: 0000000000000002   
   NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
   CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
   CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
   DSISR: 0000000042000000     Syscall Result: 0000000000000000


  
And then on a rhel7 traditional KDUMP dumpfile, both the panic task and the 
non-panicking active tasks are missing the exception trace.  Here's a sample
panic task backtrace using the current manner:

  PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
   #0 [c000000019823610] .crash_kexec at c0000000001725e0
   #1 [c000000019823810] .die at c000000000020a48
   #2 [c0000000198238c0] .bad_page_fault at c0000000000530d8
   #3 [c000000019823940] handle_page_fault at c000000000009584
   Data Access [300] exception frame:
   R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780   
   R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8   
   R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780   
   R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f   
   R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550   
   R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000   
   R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0   
   R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000   
   R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170   
   R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530   
   R30: c0000000011d8fa0    R31: 0000000000000002   
   NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
   CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
   CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
   #4 [c000000019823c30] .sysrq_handle_crash at c00000000055c3f8
   [Link Register] [c000000019823c30] .write_sysrq_trigger at c00000000055cfac
   #5 [c000000019823cf0] .proc_reg_write at c00000000037d120
   #6 [c000000019823d80] .sys_write at c0000000002d68e4
   #7 [c000000019823e30] syscall_exit at c00000000000a17c
   System Call [c00] exception frame:
   R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0   
   R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002   
   R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000   
   R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
   R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550   
   R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000   
   R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0   
   R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000   
   R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20   
   R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108   
   R30: 00003fffad680000    R31: 0000000000000002   
   NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
   CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
   CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000

And here it is with your patch:

  PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
   R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780   
   R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8   
   R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780   
   R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f   
   R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550   
   R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000   
   R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0   
   R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000   
   R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170   
   R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530   
   R30: c0000000011d8fa0    R31: 0000000000000002   
   NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
   CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
   CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000
   NIP [c00000000055c3f8] .sysrq_handle_crash
   LR  [c00000000055cfac] .write_sysrq_trigger
   #0 [c000000019823c30] .write_sysrq_trigger at c00000000055cf88
   #1 [c000000019823cf0] .proc_reg_write at c00000000037d120
   #2 [c000000019823d80] .sys_write at c0000000002d68e4
   #3 [c000000019823e30] syscall_exit at c00000000000a17c
   System Call [c00] exception frame:
   R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0   
   R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002   
   R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000   
   R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000   
   R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550   
   R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000   
   R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0   
   R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000   
   R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20   
   R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108   
   R30: 00003fffad680000    R31: 0000000000000002   
   NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
   CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
   CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
   DSISR: 0000000042000000     Syscall Result: 0000000000000000

And from the same kdump, here's a non-panicking active task with the current 
way of doing things:

  PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
   #0 [c0000001dffdfb90] .crash_ipi_callback at c00000000004fd44
   #1 [c0000001dffdfc20] .smp_ipi_demux at c000000000046bf8
   #2 [c0000001dffdfcb0] .icp_hv_ipi_action at c000000000073454
   #3 [c0000001dffdfd30] .handle_irq_event_percpu at c0000000001afaa4
   #4 [c0000001dffdfe10] .handle_percpu_irq at c0000000001b526c
   #5 [c0000001dffdfe90] .generic_handle_irq at c0000000001aed1c
   #6 [c0000001dffdff10] .__do_irq at c000000000010d44
   #7 [c0000001dffdff90] .call_do_irq at c000000000023f60
   #8 [c00000000130b7e0] .do_IRQ at c000000000010eec
   #9 [c00000000130b880] hardware_interrupt_common at c000000000002614
   Hardware Interrupt [501] exception frame:
   R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780   
   R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120   
   R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000   
   R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a   
   R12: 0000000000000000    R13: c000000007b80000   
   NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
   CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
   CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
   DSISR: c000000000157684     Syscall Result: 0000000000000000
  #10 [c00000000130bb70] .plpar_hcall_norets at c0000000000849b4
  [Link Register] [c00000000130bb70] .shared_cede_loop at c000000000710070
  #11 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
  #12 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
  #13 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
  #14 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
  #15 [c00000000130be80] .rest_init at c00000000000c5f4
  #16 [c00000000130bef0] .start_kernel at c000000000c34258
  #17 [c00000000130bf90] start_here_common at c000000000009b6c

and here with your patch applied:

  PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
   R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780   
   R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120   
   R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000   
   R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a   
   R12: 0000000000000000    R13: c000000007b80000   
   NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
   CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
   CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
   DSISR: c000000000157684     Syscall Result: 0000000000000000
   NIP [c0000000000849b4] .plpar_hcall_norets
   LR  [c000000000710070] .shared_cede_loop
   #0 [c00000000130bb70] (null) at 3  (unreliable)
   #1 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
   #2 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
   #3 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
   #4 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
   #5 [c00000000130be80] .rest_init at c00000000000c5f4
   #6 [c00000000130bef0] .start_kernel at c000000000c34258
   #7 [c00000000130bf90] start_here_common at c000000000009b6c

Is that what you really want?

It would be unfortunate to lose all of that exception information, both
for the panic and for all of the non-panicking active tasks. 

Would it be possible to only apply your changes to FADUMP dumpfiles?  
(and to possibly resurrect the missing exception backtrace for the FADUMP
panic task?)

Dave





More information about the Crash-utility mailing list