[Crash-utility] [PATCH] ppc64: fix 'bt' command for vmcore captured with fadump.

Hari Bathini hbathini at linux.vnet.ibm.com
Thu Jan 19 12:18:53 UTC 2017



On Thursday 19 January 2017 02:05 AM, Dave Anderson wrote:
>
> ----- Original Message -----
>> Without this patch, backtraces of active tasks maybe be of the form
>> "#0 [c0000000700b3a90] (null) at c0000000700b3b50  (unreliable)" for
>> kernel dumps captured with fadump.  Trying to use ptregs saved for
>> active tasks before falling back to stack-search method. Also, getting
>> rid of warnings like "‘is_hugepage’ declared inline after being called".
>>
>> Signed-off-by: Hari Bathini <hbathini at linux.vnet.ibm.com>
> Hari,
>
> I only have 1 sample vmcore generated by FADUMP, and I see that
> the backtraces of the non-panicking active tasks are an improvement
> given that they show the exception frame register set.  However, I also
> note that the panic task backtrace has changed, from this using the
> current method:
>
>    PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
>     #0 [c000000255933620] .crash_fadump at c00000000002cbb8
>     #1 [c0000002559336c0] .die at c000000000030dc8
>     #2 [c000000255933770] .bad_page_fault at c000000000043748
>     #3 [c0000002559337f0] handle_page_fault at c000000000005228
>     Data Access [300] exception frame:
>     R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628
>     R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff
>     R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8
>     R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590
>     R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc
>     R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
>     R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
>     R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788
>     R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414
>     R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063
>     R30: c000000000ec9208    R31: c000000001423aac
>     NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
>     CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
>     CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
>     DSISR: 0000000042000000     Syscall Result: 0000000000000000
>     #4 [c000000255933ae0] .sysrq_handle_crash at c00000000039c57c
>     [Link Register] [c000000255933ae0] .__handle_sysrq at c00000000039c8c8
>     #5 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
>     #6 [c000000255933c30] .proc_reg_write at c000000000244874
>     #7 [c000000255933ce0] .vfs_write at c0000000001c9dac
>     #8 [c000000255933d80] .sys_write at c0000000001c9fd8
>     #9 [c000000255933e30] syscall_exit at c000000000008564
>     System Call [c00] exception frame:
>     R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268
>     R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002
>     R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001
>     R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
>     R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc
>     R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
>     R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
>     R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000
>     R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c
>     R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0
>     R30: 00000fffa55a0000    R31: 0000000000000002
>     NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
>     CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
>     CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
>     DSISR: 0000000042000000     Syscall Result: 0000000000000000
>    
> to this with your patch, where the exception backtrace is missing:
>
>    PID: 1913   TASK: c000000250472120  CPU: 5   COMMAND: "bash"
>     R0:  0000000000000001    R1:  c000000255933ae0    R2:  c000000000f27628
>     R3:  0000000000000063    R4:  0000000000000000    R5:  ffffffffffffffff
>     R6:  0000000000000070    R7:  00000000000020b8    R8:  000000001cbbfaa8
>     R9:  0000000000000000    R10: 0000000000000002    R11: c00000000039c590
>     R12: 0000000028242482    R13: c000000000ff3180    R14: 000000001012b3dc
>     R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
>     R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
>     R21: 000000001012b3e4    R22: 0000000000000000    R23: c000000000e57788
>     R24: 0000000000000004    R25: c000000000e57928    R26: c000000000e37414
>     R27: 0000000000000000    R28: 0000000000000001    R29: 0000000000000063
>     R30: c000000000ec9208    R31: c000000001423aac
>     NIP: c00000000039c57c    MSR: 8000000000009032    OR3: c000000255933a20
>     CTR: c00000000039c560    LR:  c00000000039c8c8    XER: 0000000000000001
>     CCR: 0000000028242482    MQ:  0000000000000000    DAR: 0000000000000000
>     DSISR: 0000000042000000     Syscall Result: 0000000000000000
>     NIP [c00000000039c57c] .sysrq_handle_crash
>     LR  [c00000000039c8c8] .__handle_sysrq
>     #0 [c000000255933ae0] .__handle_sysrq at c00000000039c89c
>     #1 [c000000255933ba0] .write_sysrq_trigger at c00000000039ca70
>     #2 [c000000255933c30] .proc_reg_write at c000000000244874
>     #3 [c000000255933ce0] .vfs_write at c0000000001c9dac
>     #4 [c000000255933d80] .sys_write at c0000000001c9fd8
>     #5 [c000000255933e30] syscall_exit at c000000000008564
>     System Call [c00] exception frame:
>     R0:  0000000000000004    R1:  00000fffec87b540    R2:  00000080cec13268
>     R3:  0000000000000001    R4:  00000fffa55a0000    R5:  0000000000000002
>     R6:  000000007fffffff    R7:  0000000000000000    R8:  0000000000000001
>     R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
>     R12: 0000000000000000    R13: 00000080cea0ce10    R14: 000000001012b3dc
>     R15: 0000000000000000    R16: 0000000000000000    R17: 0000000010129c58
>     R18: 0000000010129bf8    R19: 000000001012b948    R20: 0000000000000000
>     R21: 000000001012b3e4    R22: 000001003391c720    R23: 0000000000000000
>     R24: 0000000000000001    R25: 000000001012b3e0    R26: 00000fffec87b86c
>     R27: 00000fffec87b868    R28: 0000000000000002    R29: 00000080cec006a0
>     R30: 00000fffa55a0000    R31: 0000000000000002
>     NIP: 00000080ceb49548    MSR: 800000000000d032    OR3: 0000000000000001
>     CTR: 00000080cead9d50    LR:  00000080cead9db8    XER: 0000000000000000
>     CCR: 0000000044242424    MQ:  0000000000000001    DAR: 00000100339436b8
>     DSISR: 0000000042000000     Syscall Result: 0000000000000000
>
>
>    
> And then on a rhel7 traditional KDUMP dumpfile, both the panic task and the
> non-panicking active tasks are missing the exception trace.  Here's a sample
> panic task backtrace using the current manner:
>
>    PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
>     #0 [c000000019823610] .crash_kexec at c0000000001725e0
>     #1 [c000000019823810] .die at c000000000020a48
>     #2 [c0000000198238c0] .bad_page_fault at c0000000000530d8
>     #3 [c000000019823940] handle_page_fault at c000000000009584
>     Data Access [300] exception frame:
>     R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780
>     R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8
>     R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780
>     R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f
>     R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550
>     R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
>     R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
>     R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
>     R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170
>     R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530
>     R30: c0000000011d8fa0    R31: 0000000000000002
>     NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
>     CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
>     CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
>     DSISR: 0000000042000000     Syscall Result: 0000000000000000
>     #4 [c000000019823c30] .sysrq_handle_crash at c00000000055c3f8
>     [Link Register] [c000000019823c30] .write_sysrq_trigger at c00000000055cfac
>     #5 [c000000019823cf0] .proc_reg_write at c00000000037d120
>     #6 [c000000019823d80] .sys_write at c0000000002d68e4
>     #7 [c000000019823e30] syscall_exit at c00000000000a17c
>     System Call [c00] exception frame:
>     R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0
>     R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002
>     R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000
>     R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
>     R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550
>     R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
>     R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
>     R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
>     R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20
>     R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108
>     R30: 00003fffad680000    R31: 0000000000000002
>     NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
>     CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
>     CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
>     DSISR: 0000000042000000     Syscall Result: 0000000000000000
>
> And here it is with your patch:
>
>    PID: 32696  TASK: c0000001922ed5d0  CPU: 1   COMMAND: "runtest.sh"
>     R0:  c00000000055cf88    R1:  c000000019823c30    R2:  c00000000130a780
>     R3:  0000000000000063    R4:  c000000001845888    R5:  c0000000018564f8
>     R6:  0000000000005194    R7:  c0000000014b99a0    R8:  c000000000cca780
>     R9:  0000000000000001    R10: 0000000000000000    R11: 000000000000012f
>     R12: 0000000048222842    R13: c000000007b80900    R14: 0000000010142550
>     R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
>     R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
>     R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
>     R24: 0000000000000001    R25: 0000000000000007    R26: c00000000120b170
>     R27: 0000000000000063    R28: c000000001709c98    R29: c00000000120b530
>     R30: c0000000011d8fa0    R31: 0000000000000002
>     NIP: c00000000055c3f8    MSR: 8000000000009032    OR3: c000000000009358
>     CTR: c00000000055c3e0    LR:  c00000000055cfac    XER: 0000000000000001
>     CCR: 0000000048222822    MQ:  0000000000000000    DAR: 0000000000000000
>     DSISR: 0000000042000000     Syscall Result: 0000000000000000
>     NIP [c00000000055c3f8] .sysrq_handle_crash
>     LR  [c00000000055cfac] .write_sysrq_trigger
>     #0 [c000000019823c30] .write_sysrq_trigger at c00000000055cf88
>     #1 [c000000019823cf0] .proc_reg_write at c00000000037d120
>     #2 [c000000019823d80] .sys_write at c0000000002d68e4
>     #3 [c000000019823e30] syscall_exit at c00000000000a17c
>     System Call [c00] exception frame:
>     R0:  0000000000000004    R1:  00003fffc7738e00    R2:  00003fffb4163cc0
>     R3:  0000000000000001    R4:  00003fffad680000    R5:  0000000000000002
>     R6:  0000000000000010    R7:  0000000000000000    R8:  0000000000000000
>     R9:  0000000000000000    R10: 0000000000000000    R11: 0000000000000000
>     R12: 0000000000000000    R13: 00003fffb426c330    R14: 0000000010142550
>     R15: 0000000040000000    R16: 0000000010143cdc    R17: 0000000000000000
>     R18: 00000000101306fc    R19: 00000000101424dc    R20: 00000000101424e0
>     R21: 000000001013c6f0    R22: 000000001013c970    R23: 0000000000000000
>     R24: 0000000010143ce0    R25: 00000000100f65d0    R26: 00000100277ffa20
>     R27: 0000000000000001    R28: 0000000000000002    R29: 00003fffb4151108
>     R30: 00003fffad680000    R31: 0000000000000002
>     NIP: 00003fffb408a120    MSR: 800000000280f032    OR3: 0000000000000001
>     CTR: 0000000000000000    LR:  00003fffb4015704    XER: 0000000000000000
>     CCR: 0000000048222882    MQ:  0000000000000001    DAR: 00003fffad680000
>     DSISR: 0000000042000000     Syscall Result: 0000000000000000
>
> And from the same kdump, here's a non-panicking active task with the current
> way of doing things:
>
>    PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
>     #0 [c0000001dffdfb90] .crash_ipi_callback at c00000000004fd44
>     #1 [c0000001dffdfc20] .smp_ipi_demux at c000000000046bf8
>     #2 [c0000001dffdfcb0] .icp_hv_ipi_action at c000000000073454
>     #3 [c0000001dffdfd30] .handle_irq_event_percpu at c0000000001afaa4
>     #4 [c0000001dffdfe10] .handle_percpu_irq at c0000000001b526c
>     #5 [c0000001dffdfe90] .generic_handle_irq at c0000000001aed1c
>     #6 [c0000001dffdff10] .__do_irq at c000000000010d44
>     #7 [c0000001dffdff90] .call_do_irq at c000000000023f60
>     #8 [c00000000130b7e0] .do_IRQ at c000000000010eec
>     #9 [c00000000130b880] hardware_interrupt_common at c000000000002614
>     Hardware Interrupt [501] exception frame:
>     R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780
>     R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120
>     R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000
>     R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a
>     R12: 0000000000000000    R13: c000000007b80000
>     NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
>     CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
>     CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
>     DSISR: c000000000157684     Syscall Result: 0000000000000000
>    #10 [c00000000130bb70] .plpar_hcall_norets at c0000000000849b4
>    [Link Register] [c00000000130bb70] .shared_cede_loop at c000000000710070
>    #11 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
>    #12 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
>    #13 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
>    #14 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
>    #15 [c00000000130be80] .rest_init at c00000000000c5f4
>    #16 [c00000000130bef0] .start_kernel at c000000000c34258
>    #17 [c00000000130bf90] start_here_common at c000000000009b6c
>
> and here with your patch applied:
>
>    PID: 0      TASK: c000000001241c00  CPU: 0   COMMAND: "swapper/0"
>     R0:  0000000000000000    R1:  c00000000130bb70    R2:  c00000000130a780
>     R3:  0000000000000000    R4:  0000000000000000    R5:  800000000bb71120
>     R6:  800000000bb844f8    R7:  0000000000000000    R8:  0000000000000000
>     R9:  0000000000000040    R10: 0000000000000000    R11: 000000005f9c862a
>     R12: 0000000000000000    R13: c000000007b80000
>     NIP: c0000000000849b4    MSR: 8000000000009032    OR3: 0000000000000c00
>     CTR: 0000000000000000    LR:  c000000000710070    XER: 0000000000000000
>     CCR: 0000000024002084    MQ:  0000000000000001    DAR: c000000001818380
>     DSISR: c000000000157684     Syscall Result: 0000000000000000
>     NIP [c0000000000849b4] .plpar_hcall_norets
>     LR  [c000000000710070] .shared_cede_loop
>     #0 [c00000000130bb70] (null) at 3  (unreliable)
>     #1 [c00000000130bbf0] .cpuidle_idle_call at c00000000070d9b4
>     #2 [c00000000130bcc0] .pseries_lpar_idle at c0000000000872f0
>     #3 [c00000000130bd30] .arch_cpu_idle at c000000000017b44
>     #4 [c00000000130bdb0] .cpu_startup_entry at c000000000149b10
>     #5 [c00000000130be80] .rest_init at c00000000000c5f4
>     #6 [c00000000130bef0] .start_kernel at c000000000c34258
>     #7 [c00000000130bf90] start_here_common at c000000000009b6c
>
> Is that what you really want?
>
> It would be unfortunate to lose all of that exception information, both
> for the panic and for all of the non-panicking active tasks.

Hi Dave,

Unfortunate, yes. But I think the exception information we are going to
lose out would be related to either crash_ipi_callback, crash_kexec,
crash_fadump or some such which may not be significant in debugging?
At least, that was the assumption with which I posted this patch..

Thanks
Hari




More information about the Crash-utility mailing list