[Crash-utility] [PATCH] Fix for "bt" command incorrectly printing eframe stack with a bogus warning

lijiang lijiang at redhat.com
Thu Feb 16 07:49:25 UTC 2023


On Thu, Feb 16, 2023 at 2:22 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab at nec.com>
wrote:

> On 2023/02/16 13:05, lijiang wrote:
> >>> On Thu, Feb 16, 2023 at 8:48 AM HAGIO KAZUHITO(萩尾 一仁) <
> k-hagio-ab at nec.com <mailto:k-hagio-ab at nec.com>> wrote:
> >>>
> >>>     On 2023/02/15 17:24, Lianbo Jiang wrote:
> >>>     > Kernel commit 7d65f4a65532 ("irq: Consolidate do_softirq() arch
> overriden
> >>>     > implementations") renamed the call_softirq to
> do_softirq_own_stack.
> >>>     > Crash may incorrectly output the eframe stack with a warning as
> below:
> >>>     >
> >>>     >    crash> foreach bt
> >>>     >    ...
> >>>     >    PID: 0        TASK: ffff914f820a8000  CPU: 25   COMMAND:
> "swapper/25"
> >>>     >     #0 [fffffe0000504e48] crash_nmi_callback at ffffffffa665d763
> >>>     >     #1 [fffffe0000504e50] nmi_handle at ffffffffa662a423
> >>>     >     #2 [fffffe0000504ea8] default_do_nmi at ffffffffa6fe7dc9
> >>>     >     #3 [fffffe0000504ec8] do_nmi at ffffffffa662a97f
> >>>     >     #4 [fffffe0000504ef0] end_repeat_nmi at ffffffffa70015e8
> >>>     >        [exception RIP: clone_endio+172]
> >>>     >        RIP: ffffffffc005c1ec  RSP: ffffa1d403d08e98  RFLAGS:
> 00000246
> >>>     >        RAX: 0000000000000000  RBX: ffff915326fba230  RCX:
> 0000000000000018
> >>>     >        RDX: ffffffffc0075400  RSI: 0000000000000000  RDI:
> ffff915326fba230
> >>>     >        RBP: ffff915326fba1c0   R8: 0000000000001000   R9:
> ffff915308d6d2a0
> >>>     >        R10: 000000a97dfe5e10  R11: ffffa1d40038fe98  R12:
> ffff915302babc40
> >>>     >        R13: ffff914f94360000  R14: 0000000000000000  R15:
> 0000000000000000
> >>>     >        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>>     >    --- <NMI exception stack> ---
> >>>     >     #5 [ffffa1d403d08e98] clone_endio at ffffffffc005c1ec
> [dm_mod]
> >>>     >     #6 [ffffa1d403d08ed0] blk_update_request at ffffffffa6a96954
> >>>     >     #7 [ffffa1d403d08f10] scsi_end_request at ffffffffa6c9b968
> >>>     >     #8 [ffffa1d403d08f48] scsi_io_completion at ffffffffa6c9bb3e
> >>>     >     #9 [ffffa1d403d08f90] blk_complete_reqs at ffffffffa6aa0e95
> >>>     >     #10 [ffffa1d403d08fa0] __softirqentry_text_start at
> ffffffffa72000dc
> >>>     >     #11 [ffffa1d403d08ff0] do_softirq_own_stack at
> ffffffffa7000f9a
> >>>     >    --- <IRQ stack> ---
> >>>     >     #12 [ffffa1d40038fe70] do_softirq_own_stack at
> ffffffffa7000f9a
> >>>     >        [exception RIP: unknown or invalid address]
> >>>     >        RIP: 0000000000000000  RSP: 0000000000000000  RFLAGS:
> 00000000
> >>>     >        RAX: ffffffffa672eae5  RBX: ffffffffa83b34e0  RCX:
> ffffffffa672eb12
> >>>     >        RDX: 0000000000000010  RSI: 8b7d6c8869010c00  RDI:
> 0000000000000085
> >>>     >        RBP: 0000000000000286   R8: ffff914f820a8000   R9:
> ffffffffa67a94e0
> >>>     >        R10: 0000000000000286  R11: ffffffffa66fb4c5  R12:
> ffffffffa67a898b
> >>>     >        R13: 0000000000000000  R14: fffffffffffffff8  R15:
> ffffffffa67a1e68
> >>>     >        ORIG_RAX: 0000000000000000  CS: 0000  SS: ffffffffa672edff
> >>>     >     bt: WARNING: possibly bogus exception frame
> >>>     >     #13 [ffffa1d40038ff30] start_secondary at ffffffffa665fa2c
> >>>     >     #14 [ffffa1d40038ff50] secondary_startup_64_no_verify at
> ffffffffa6600116
> >>>     >     ...
> >>>
> >>>     Thank you for the patch.  Could I have "bt -c 25" output with the
> patch
> >>>     and "bt -r -c 25 | tail -n 40" output for checking?
> >>>
> >>> Sure.
>
> Thank you, looks better.  Applied with Georges' suggestion and a few
> commit log changes.
>
> https://github.com/crash-utility/crash/commit/e0e6e4a7ee03b3d00b50a9e4db2f2ea6f7da0da3
>
>
Looks good.


> The irq_eframe_link calculated on RHEL8 also may be wrong, but
>

Those kernel commits are not introduced:
    [1] v5.8: 931b94145981 ("x86/entry: Provide helpers for executing on
the irqstack")
    [2] v5.8: fa5e5c409213 ("x86/entry: Use idtentry for interrupts")
    [3] v5.12: 52d743f3b712 ("x86/softirq: Remove indirection in
do_softirq_own_stack()")

I tested it(based on 452) and didn't get any errors:

crash> bt
PID: 0        TASK: ffff9ed24128a800  CPU: 3    COMMAND: "swapper/3"
 #0 [ffffb7c300db8b80] machine_kexec at ffffffff8f46b063
 #1 [ffffb7c300db8bd8] __crash_kexec at ffffffff8f5b357a
 #2 [ffffb7c300db8c98] panic at ffffffff8f4f50ff
 #3 [ffffb7c300db8d18] sysrq_handle_crash at ffffffff8f9f75e1
 #4 [ffffb7c300db8d20] __handle_sysrq.cold.13 at ffffffff8f9f7f04
 #5 [ffffb7c300db8d50] sysrq_filter at ffffffff8f9f7cbd
 #6 [ffffb7c300db8d88] input_to_handler at ffffffff8fb3780f
 #7 [ffffb7c300db8dc0] input_pass_values at ffffffff8fb388ab
 #8 [ffffb7c300db8de8] input_handle_event at ffffffff8fb3a9e8
 #9 [ffffb7c300db8e18] input_event at ffffffff8fb3ae9f
#10 [ffffb7c300db8e50] atkbd_interrupt at ffffffff8fb413a3
#11 [ffffb7c300db8ea8] serio_interrupt at ffffffff8fb33a86
#12 [ffffb7c300db8ed8] i8042_interrupt at ffffffff8fb35134
#13 [ffffb7c300db8f18] __handle_irq_event_percpu at ffffffff8f569d70
#14 [ffffb7c300db8f58] handle_irq_event_percpu at ffffffff8f569ef0
#15 [ffffb7c300db8f80] handle_irq_event at ffffffff8f569f76
#16 [ffffb7c300db8fa0] handle_edge_irq at ffffffff8f56e3d2
#17 [ffffb7c300db8fb8] handle_irq at ffffffff8f42924c
#18 [ffffb7c300db8fc0] do_IRQ at ffffffff8fe01e89
--- <IRQ stack> ---
#19 [ffffb7c300cf3de8] ret_from_intr at ffffffff8fe00a8f
    [exception RIP: native_safe_halt+14]
    RIP: ffffffff8fdf7eae  RSP: ffffb7c300cf3e98  RFLAGS: 00000206
    RAX: ffffffff8fdf7d40  RBX: 0000000000000003  RCX: 0000000000000001
    RDX: 0000000000929b82  RSI: 0000000000000083  RDI: 0000000000000003
    RBP: 0000000000000003   R8: 0000b7ac3793def2   R9: 000000000004f515
    R10: 0000000000000000  R11: 0000000000000019  R12: 0000000000000000
    R13: 0000000000000000  R14: ffffffffffffffff  R15: ffff9ed24128a800
    ORIG_RAX: ffffffffffffffdb  CS: 0010  SS: 0018
#20 [ffffb7c300cf3e98] default_idle at ffffffff8fdf7d4a
#21 [ffffb7c300cf3ea0] default_idle_call at ffffffff8fdf8034
#22 [ffffb7c300cf3ec0] do_idle at ffffffff8f52eb63
#23 [ffffb7c300cf3f10] cpu_startup_entry at ffffffff8f52edff
#24 [ffffb7c300cf3f30] start_secondary at ffffffff8f45fa2c
#25 [ffffb7c300cf3f50] secondary_startup_64_no_verify at ffffffff8f400116

crash> log
...
[161537.153436] Call Trace:
[161537.161011]  <IRQ>
[161537.162223]  dump_stack+0x41/0x60
[161537.166883]  panic+0xe7/0x2ac
[161537.168950]  ? printk+0x58/0x73
[161537.170718]  sysrq_handle_crash+0x11/0x20
[161537.177654]  __handle_sysrq.cold.13+0x48/0xff
[161537.180096]  sysrq_filter+0x32d/0x3f0
[161537.182079]  input_to_handler+0x4f/0xf0
[161537.185589]  input_pass_values.part.7+0x11b/0x140
[161537.188160]  input_handle_event+0x138/0x5a0
[161537.190729]  input_event+0x4f/0x80
[161537.192653]  atkbd_interrupt+0x623/0x6b0
[161537.195040]  serio_interrupt+0x46/0x90
[161537.201579]  i8042_interrupt+0x154/0x240
[161537.203750]  __handle_irq_event_percpu+0x40/0x190
[161537.208421]  handle_irq_event_percpu+0x30/0x80
[161537.210869]  handle_irq_event+0x36/0x57
[161537.212946]  handle_edge_irq+0x82/0x190
[161537.215345]  handle_irq+0x1c/0x30
[161537.217128]  do_IRQ+0x49/0xd0
[161537.219876]  common_interrupt+0xf/0xf
[161537.221828]  </IRQ>
[161537.222944] RIP: 0010:native_safe_halt+0xe/0x20
[161537.225239] Code: 00 f0 80 48 02 20 48 8b 00 a8 08 75 c0 e9 79 ff ff ff
90 9
0 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 96 22 41 00 fb f4 <e9> 8d
95 2
0 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 e9 07 00 00
[161537.234417] RSP: 0018:ffffb7c300cf3e98 EFLAGS: 00000206 ORIG_RAX:
ffffffffffffffdb
[161537.238201] RAX: ffffffff8fdf7d40 RBX: 0000000000000003 RCX:
0000000000000001
[161537.241756] RDX: 0000000000929b82 RSI: 0000000000000083 RDI:
0000000000000003
[161537.245308] RBP: 0000000000000003 R08: 0000b7ac3793def2 R09:
000000000004f515
[161537.248868] R10: 0000000000000000 R11: 0000000000000019 R12:
0000000000000000
[161537.252420] R13: 0000000000000000 R14: ffffffffffffffff R15:
ffff9ed24128a800
[161537.255983]  ? tdx_safe_halt+0x80/0x80
[161537.259858]  default_idle+0xa/0x20
[161537.261677]  default_idle_call+0x44/0xf0
[161537.263719]  do_idle+0x213/0x2d0
[161537.265694]  cpu_startup_entry+0x6f/0x80
[161537.267750]  start_secondary+0x18c/0x1d0
[161537.270058]  secondary_startup_64_no_verify+0xd1/0xdb
...

crash> help -m |grep link
          irq_eframe_link: 40

I have a plan of changing that part and will revisit with it.
>
>
Sounds good.

Thanks.
Lianbo


> Thanks,
> Kazu
>
> >>>
> >>> crash> bt -c 25
> >>> PID: 0        TASK: ffff914f820a8000  CPU: 25   COMMAND: "swapper/25"
> >>>  #0 [fffffe0000504e48] crash_nmi_callback at ffffffffa665d763
> >>>  #1 [fffffe0000504e50] nmi_handle at ffffffffa662a423
> >>>  #2 [fffffe0000504ea8] default_do_nmi at ffffffffa6fe7dc9
> >>>  #3 [fffffe0000504ec8] do_nmi at ffffffffa662a97f
> >>>  #4 [fffffe0000504ef0] end_repeat_nmi at ffffffffa70015e8
> >>>     [exception RIP: clone_endio+172]
> >>>     RIP: ffffffffc005c1ec  RSP: ffffa1d403d08e98  RFLAGS: 00000246
> >>>     RAX: 0000000000000000  RBX: ffff915326fba230  RCX: 0000000000000018
> >>>     RDX: ffffffffc0075400  RSI: 0000000000000000  RDI: ffff915326fba230
> >>>     RBP: ffff915326fba1c0   R8: 0000000000001000   R9: ffff915308d6d2a0
> >>>     R10: 000000a97dfe5e10  R11: ffffa1d40038fe98  R12: ffff915302babc40
> >>>     R13: ffff914f94360000  R14: 0000000000000000  R15: 0000000000000000
> >>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >>> --- <NMI exception stack> ---
> >>>  #5 [ffffa1d403d08e98] clone_endio at ffffffffc005c1ec [dm_mod]
> >>>  #6 [ffffa1d403d08ed0] blk_update_request at ffffffffa6a96954
> >>>  #7 [ffffa1d403d08f10] scsi_end_request at ffffffffa6c9b968
> >>>  #8 [ffffa1d403d08f48] scsi_io_completion at ffffffffa6c9bb3e
> >>>  #9 [ffffa1d403d08f90] blk_complete_reqs at ffffffffa6aa0e95
> >>> #10 [ffffa1d403d08fa0] __softirqentry_text_start at ffffffffa72000dc
> >>> #11 [ffffa1d403d08ff0] do_softirq_own_stack at ffffffffa7000f9a
> >>> --- <IRQ stack> ---
> >>> #12 [ffffa1d40038fe70] update_ts_time_stats at ffffffffa67a1e68
> >>> #13 [ffffa1d40038fea0] do_softirq at ffffffffa66fb4c5
> >>> #14 [ffffa1d40038feb0] flush_smp_call_function_queue at
> ffffffffa67a94e0
> >>> #15 [ffffa1d40038fec0] do_idle at ffffffffa672eae5
> >>> #16 [ffffa1d40038ff10] cpu_startup_entry at ffffffffa672edff
> >>> #17 [ffffa1d40038ff30] start_secondary at ffffffffa665fa2c
> >>> #18 [ffffa1d40038ff50] secondary_startup_64_no_verify at
> ffffffffa6600116
> >>> crash>
> >>> crash> bt -r -c 25 | tail -n 40
> >>> ffffa1d40038fd80:  ffff9156ff26afc0 0000002027c3e7a1
> >>> ffffa1d40038fd90:  __next_timer_interrupt+166 ffff9156ff25aa68
> >>> ffffa1d40038fda0:  00000000fffd85da 0000000000000082
> >>> ffffa1d40038fdb0:  0000000000000082 hrtimer_get_next_event+78
> >>> ffffa1d40038fdc0:  ffff9156ff25aa40 000000200cd15980
> >>> ffffa1d40038fdd0:  __hrtimer_next_event_base+192 ffff9156ff25cf40
> >>> ffffa1d40038fde0:  7fffffffffffffff ffff9156ff25d580
> >>> ffffa1d40038fdf0:  0000000000000082 ffffffffffffffff
> >>> ffffa1d40038fe00:  0000000000000082 hrtimer_next_event_without+96
> >>> ffffa1d40038fe10:  ffff9156ff25d580 000000200cff2040
> >>> ffffa1d40038fe20:  000000200cdb406e ffff9153052f8c00
> >>> ffffa1d40038fe30:  sched_clock+5    sched_clock_cpu+12
> >>> ffffa1d40038fe40:  acpi_idle_driver+136 ffff9153052f8c00
> >>> ffffa1d40038fe50:  cpuidle_enter_state+181 000000203501afdb
> >>> ffffa1d40038fe60:  00ffffffa82c44c0 read_tsc
> >>> ffffa1d40038fe70:  update_ts_time_stats+88 fffffffffffffff8
> >>> ffffa1d40038fe80:  0000000000000000 __flush_smp_call_function_queue+219
> >>> ffffa1d40038fe90:  0000000000000286 __cpu_online_mask
> >>> ffffa1d40038fea0:  do_softirq+69    0000000000000286
> >>> ffffa1d40038feb0:  flush_smp_call_function_queue+96 ffff914f820a8000
> >>> ffffa1d40038fec0:  do_idle+405      do_idle+450
> >>> ffffa1d40038fed0:  0000000000000010 8b7d6c8869010c00
> >>> ffffa1d40038fee0:  0000000000000085 0000000000000000
> >>> ffffa1d40038fef0:  0000000000000000 0000000000000000
> >>> ffffa1d40038ff00:  0000000000000000 0000000000000000
> >>> ffffa1d40038ff10:  cpu_startup_entry+111 8b7d6c8869010c00
> >>> ffffa1d40038ff20:  534e004fc6e00600 ffffa1d40038ff38
> >>> ffffa1d40038ff30:  start_secondary+396 534e004fc6e00600
> >>> ffffa1d40038ff40:  0000000000000000 0000000040000000
> >>> ffffa1d40038ff50:  secondary_startup_64_no_verify+209 0000000000000000
> >>> ffffa1d40038ff60:  0000000000000000 0000000000000000
> >>> ffffa1d40038ff70:  0000000000000000 0000000000000000
> >>> ffffa1d40038ff80:  0000000000000000 0000000000000000
> >>> ffffa1d40038ff90:  0000000000000000 0000000000000000
> >>> ffffa1d40038ffa0:  0000000000000000 0000000000000000
> >>> ffffa1d40038ffb0:  0000000000000000 0000000000000000
> >>> ffffa1d40038ffc0:  0000000000000000 0000000000000000
> >>> ffffa1d40038ffd0:  0000000000000000 0000000000000000
> >>> ffffa1d40038ffe0:  0000000000000000 0000000000000000
> >>> ffffa1d40038fff0:  0000000000000000 0000000000000000
> >>> crash>
> >>> Thanks.
> >>> Lianbo
> >>>
> >>>     Thanks,
> >>>     Kazu
> >>>
> >>>     >
> >>>     > Also no exception frame when coming from do_softirq_own_stack.
> >>>     >
> >>>     > Reported-by: Marco Patalano <mpatalan at redhat.com <mailto:
> mpatalan at redhat.com>>
> >>>     > Signed-off-by: Lianbo Jiang <lijiang at redhat.com <mailto:
> lijiang at redhat.com>>
> >>>     > ---
> >>>     >   x86_64.c | 5 +++--
> >>>     >   1 file changed, 3 insertions(+), 2 deletions(-)
> >>>     >
> >>>     > diff --git a/x86_64.c b/x86_64.c
> >>>     > index 5b671bd97775..3428bed417df 100644
> >>>     > --- a/x86_64.c
> >>>     > +++ b/x86_64.c
> >>>     > @@ -3825,10 +3825,11 @@ in_exception_stack:
> >>>     >               up -= 1;
> >>>     >                   bt->instptr = *up;
> >>>     >               /*
> >>>     > -              *  No exception frame when coming from
> call_softirq.
> >>>     > +              *  No exception frame when coming from
> call_softirq
> >>>     > +              *  or do_softirq_own_stack.
> >>>     >                */
> >>>     >               if ((sp = value_search(bt->instptr, &offset)) &&
> >>>     > -                 STREQ(sp->name, "call_softirq"))
> >>>     > +                 (STREQ(sp->name, "call_softirq") ||
> STREQ(sp->name, "do_softirq_own_stack")))
> >>>     >                       irq_eframe = 0;
> >>>     >                   bt->frameptr = 0;
> >>>     >                   done = FALSE;
> >>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20230216/6e6b7a54/attachment-0001.htm>


More information about the Crash-utility mailing list