[Crash-utility] [PATCH v2 1/2] bt: x86_64: filter out idle task stack

Qi Zheng zhengqi.arch at bytedance.com
Tue May 24 12:03:41 UTC 2022



On 2022/5/24 4:53 PM, lijiang wrote:
> Thank you for the patchset, Qi.
> 
> On Fri, May 20, 2022 at 10:26 AM HAGIO KAZUHITO(萩尾 一仁) 
> <k-hagio-ab at nec.com <mailto:k-hagio-ab at nec.com>> wrote:
> 
>     Hi Qi,
> 
>     thanks for the update.
> 
>     On 2022/05/19 22:48, Qi Zheng wrote:
>      > When we use crash to troubleshoot softlockup and other problems,
>      > we often use the 'bt -a' command to print the stacks of running
>      > processes on all CPUs. But now some servers have hundreds of CPUs
>      > (such as AMD machines), which causes the 'bt -a' command to output
>      > a lot of process stacks. And many of these stacks are the stacks
>      > of the idle process, which are not needed by us.
>      >
>      > Therefore, in order to reduce this part of the interference
>     information,
>      > this patch adds the -n option to the bt command. When we specify
>      > '-n idle' (meaning no idle), the stack of the idle process will be
>      > filtered out, thus speeding up our troubleshooting.
>      >
> 
>      > And like option '-a' of bt, option '-n idle' works only for dumpfiles
>      > captured by kdump.
> 
>     It's a bit different from '-a', will change to:
> 
>         And the option works only for crash dumps captured by kdump.
> 
>     because '-a' works for live dumps but '-n idle' doesn't work like this:
> 
>     crash> bt -a -n idle
>     PID: 0      TASK: ffffffff82212780  CPU: 0   COMMAND: "swapper/0"
>           [exception RIP: native_safe_halt+2]
>           RIP: ffffffff81820c92  RSP: ffffffff82203e90  RFLAGS: 00000246
>           RAX: ffffffff818209c0  RBX: 0000000000000000  RCX:
>     0000000000000001
>           RDX: 0000000000000001  RSI: 0000000000000087  RDI:
>     0000000000000000
>           RBP: 0000000000000000   R8: 00007584cc726bcd   R9:
>     0000000000000000
>           R10: 0000000000000000  R11: 0000000000000000  R12:
>     0000000000000000
>           R13: 0000000000000000  R14: 000000007f46e168  R15:
>     000000007ff922a0
>           CS: 0010  SS: 0018
>        #0 [ffffffff82203e90] default_idle at ffffffff818209da
>        #1 [ffffffff82203eb0] do_idle at ffffffff810de691
>        #2 [ffffffff82203ef0] cpu_startup_entry at ffffffff810de8ff
>        #3 [ffffffff82203f10] start_kernel at ffffffff827821eb
>        #4 [ffffffff82203f50] secondary_startup_64 at ffffffff810000e7
>     ...
> 
>     And with the following patch,
> 
>     Acked-by: Kazuhito Hagio <k-hagio-ab at nec.com
>     <mailto:k-hagio-ab at nec.com>>
> 
>     --- a/help.c
>     +++ b/help.c
>     @@ -1909,7 +1909,7 @@ char *help_bt[] = {
>        "bt",
>        "backtrace",
>        "[-a|-c cpu(s)|-g|-r|-t|-T|-l|-e|-E|-f|-F|-o|-O|-v|-p] [-R ref]
>     [-s [-x|d]]"
>     -"\n     [-I ip] [-S sp] [pid | task]",
>     +"\n     [-I ip] [-S sp] [-n idle] [pid | task]",
> 
> 
> In addition to the above issue, another issue may occur on other 
> unsupported architectures such as ppc64le:
> ...
>        KERNEL: vmlinux
>      DUMPFILE: /var/crash/127.0.0.1-2022-05-24-03:55:17/vmcore  [PARTIAL 
> DUMP]
>          CPUS: 8
>          DATE: Tue May 24 03:54:59 EDT 2022
>        UPTIME: 00:35:41
> LOAD AVERAGE: 0.54, 0.95, 1.28
>         TASKS: 175
> ...
>       MACHINE: ppc64le  (2900 Mhz)
>        MEMORY: 8 GB
>         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>           PID: 16173
>       COMMAND: "bash"
>          TASK: c0000000e9579000  [THREAD_INFO: c0000000e9579000]
>           CPU: 0
>         STATE: TASK_RUNNING (PANIC)
> 
> crash> bt -n idle
> PID: 16173  TASK: c0000000e9579000  CPU: 0   COMMAND: "bash"
>   #0 [c000000031e33b00] __crash_kexec at c00000000026f1a8
>   #1 [c000000031e33ba0] sysrq_handle_crash at c00000000090af78
>   #2 [c000000031e33c00] __handle_sysrq at c00000000090bbcc
>   #3 [c000000031e33ca0] write_sysrq_trigger at c00000000090c458
>   #4 [c000000031e33ce0] proc_reg_write at c00000000060bb2c
>   #5 [c000000031e33d10] vfs_write at c0000000005398fc
>   #6 [c000000031e33d60] ksys_write at c000000000539e84
>   #7 [c000000031e33db0] system_call_exception at c0000000000303c0
>   #8 [c000000031e33e10] system_call_vectored_common at c00000000000bfe8
> crash>
> 
> It might be confusing, because the "bt -n idle" is not supported on 
> ppc64le, but still can print call traces.
> 
>        "  Display a kernel stack backtrace.  If no arguments are given,
>     the stack",
>        "  trace of the current context will be displayed.\n",
>        "       -a  displays the stack traces of the active task on each
>     CPU.",
> 
>     No need to repost, we can fix these when merging.
> 
>     Thanks,
>     Kazu
> 
>      >
>      > The command output is as follows:
>      > crash> bt -a -n idle
>      > [...]
>      > PID: 0      TASK: ffff889ff93f5a00  CPU: 5   COMMAND: "swapper/5"
>      >
>      > PID: 0      TASK: ffff889ff93f0000  CPU: 6   COMMAND: "swapper/6"
>      >
>      > PID: 0      TASK: ffff889ff8c30000  CPU: 7   COMMAND: "swapper/7"
>      >
>      > PID: 0      TASK: ffff889ff8c34380  CPU: 8   COMMAND: "swapper/8"
>      >
>      > PID: 0      TASK: ffff889ff8c32d00  CPU: 9   COMMAND: "swapper/9"
>      >
>      > PID: 0      TASK: ffff889ff8c31680  CPU: 10  COMMAND: "swapper/10"
>      >
>      > PID: 0      TASK: ffff889ff8c35a00  CPU: 11  COMMAND: "swapper/11"
>      >
>      > PID: 0      TASK: ffff889ff8c3c380  CPU: 12  COMMAND: "swapper/12"
>      >
>      > PID: 150773  TASK: ffff889fe85a1680  CPU: 13  COMMAND: "bash"
>      >   #0 [ffffc9000d35bcd0] machine_kexec at ffffffff8105a407
>      >   #1 [ffffc9000d35bd28] __crash_kexec at ffffffff8113033d
>      >   #2 [ffffc9000d35bdf0] panic at ffffffff81081930
>      >   #3 [ffffc9000d35be70] sysrq_handle_crash at ffffffff814e38d1
>      >   #4 [ffffc9000d35be78] __handle_sysrq.cold.12 at ffffffff814e4175
>      >   #5 [ffffc9000d35bea8] write_sysrq_trigger at ffffffff814e404b
>      >   #6 [ffffc9000d35beb8] proc_reg_write at ffffffff81330d86
>      >   #7 [ffffc9000d35bed0] vfs_write at ffffffff812a72d5
>      >   #8 [ffffc9000d35bf00] ksys_write at ffffffff812a7579
>      >   #9 [ffffc9000d35bf38] do_syscall_64 at ffffffff81004259
>      >      RIP: 00007fa7abcdc274  RSP: 00007fffa731f678  RFLAGS: 00000246
>      >      RAX: ffffffffffffffda  RBX: 0000000000000002  RCX:
>     00007fa7abcdc274
>      >      RDX: 0000000000000002  RSI: 0000563ca51ee6d0  RDI:
>     0000000000000001
>      >      RBP: 0000563ca51ee6d0   R8: 000000000000000a   R9:
>     00007fa7abd6be80
>      >      R10: 000000000000000a  R11: 0000000000000246  R12:
>     00007fa7abdad760
>      >      R13: 0000000000000002  R14: 00007fa7abda8760  R15:
>     0000000000000002
>      >      ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
>      > [...]
>      >
>      > Signed-off-by: Qi Zheng <zhengqi.arch at bytedance.com
>     <mailto:zhengqi.arch at bytedance.com>>
>      > ---
>      >   defs.h   |  1 +
>      >   help.c   | 30 ++++++++++++++++++++++++++++++
>      >   kernel.c | 11 ++++++++++-
>      >   x86_64.c |  8 ++++++++
>      >   4 files changed, 49 insertions(+), 1 deletion(-)
>      >
>      > diff --git a/defs.h b/defs.h
>      > index a6735d0..96a7429 100644
>      > --- a/defs.h
>      > +++ b/defs.h
>      > @@ -5830,6 +5830,7 @@ ulong cpu_map_addr(const char *type);
>      >   #define BT_SHOW_ALL_REGS  (0x2000000000000ULL)
>      >   #define BT_REGS_NOT_FOUND (0x4000000000000ULL)
>      >   #define BT_OVERFLOW_STACK (0x8000000000000ULL)
>      > +#define BT_SKIP_IDLE     (0x10000000000000ULL)
>      >   #define BT_SYMBOL_OFFSET   (BT_SYMBOLIC_ARGS)
>      >
>      >   #define BT_REF_HEXVAL         (0x1)
>      > diff --git a/help.c b/help.c
>      > index 51a0fe3..6d8dc4f 100644
>      > --- a/help.c
>      > +++ b/help.c
>      > @@ -1915,6 +1915,8 @@ char *help_bt[] = {
>      >   "       -a  displays the stack traces of the active task on
>     each CPU.",
>      >   "           (only applicable to crash dumps)",
>      >   "       -A  same as -a, but also displays vector registers
>     (S390X only).",
>      > +"  -n idle  filter the stack of idle tasks (x86_64).",
>      > +"           (only applicable to crash dumps)",
>      >   "       -p  display the stack trace of the panic task only.",
>      >   "           (only applicable to crash dumps)",
>      >   "   -c cpu  display the stack trace of the active task on one
>     or more CPUs,",
>      > @@ -2004,6 +2006,34 @@ char *help_bt[] = {
>      >   "       DS:  002b      ESI: bfffc8a0  ES:  002b      EDI:
>     00000000 ",
>      >   "       SS:  002b      ESP: bfffc82c  EBP: bfffd224 ",
>      >   "       CS:  0023      EIP: 400d032e  ERR: 0000008e  EFLAGS:
>     00000246  ",
>      > +" ",
>      > +"  Display the stack trace of the active task(s) when the kernel
>     panicked,",
>      > +"  and filter out the stack of the idle tasks:",
>      > +"    %s> bt -a -n idle",
>      > +"    [...]",
>      > +"    PID: 0      TASK: ffff889ff8c35a00  CPU: 11  COMMAND:
>     \"swapper/11\"",
>      > +" ",
>      > +"    PID: 0      TASK: ffff889ff8c3c380  CPU: 12  COMMAND:
>     \"swapper/12\"",
>      > +" ",
>      > +"    PID: 150773  TASK: ffff889fe85a1680  CPU: 13  COMMAND:
>     \"bash\"",
>      > +"    #0 [ffffc9000d35bcd0] machine_kexec at ffffffff8105a407",
>      > +"    #1 [ffffc9000d35bd28] __crash_kexec at ffffffff8113033d",
>      > +"    #2 [ffffc9000d35bdf0] panic at ffffffff81081930",
>      > +"    #3 [ffffc9000d35be70] sysrq_handle_crash at ffffffff814e38d1",
>      > +"    #4 [ffffc9000d35be78] __handle_sysrq.cold.12 at
>     ffffffff814e4175",
>      > +"    #5 [ffffc9000d35bea8] write_sysrq_trigger at ffffffff814e404b",
>      > +"    #6 [ffffc9000d35beb8] proc_reg_write at ffffffff81330d86",
>      > +"    #7 [ffffc9000d35bed0] vfs_write at ffffffff812a72d5",
>      > +"    #8 [ffffc9000d35bf00] ksys_write at ffffffff812a7579",
>      > +"    #9 [ffffc9000d35bf38] do_syscall_64 at ffffffff81004259",
>      > +"       RIP: 00007fa7abcdc274  RSP: 00007fffa731f678  RFLAGS:
>     00000246",
>      > +"       RAX: ffffffffffffffda  RBX: 0000000000000002  RCX:
>     00007fa7abcdc274",
>      > +"       RDX: 0000000000000002  RSI: 0000563ca51ee6d0  RDI:
>     0000000000000001",
>      > +"       RBP: 0000563ca51ee6d0   R8: 000000000000000a   R9:
>     00007fa7abd6be80",
>      > +"       R10: 000000000000000a  R11: 0000000000000246  R12:
>     00007fa7abdad760",
>      > +"       R13: 0000000000000002  R14: 00007fa7abda8760  R15:
>     0000000000000002",
>      > +"       ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b",
>      > +"    [...]",
>      >   "\n  Display the stack trace of the active task on CPU 0 and 1:\n",
>      >   "    %s> bt -c 0,1",
>      >   "    PID: 0      TASK: ffffffff81a8d020  CPU: 0   COMMAND:
>     \"swapper\"",
>      > diff --git a/kernel.c b/kernel.c
>      > index d0921cf..acfacaf 100644
>      > --- a/kernel.c
>      > +++ b/kernel.c
>      > @@ -2503,7 +2503,7 @@ cmd_bt(void)
>      >       if (kt->flags & USE_OPT_BT)
>      >               bt->flags |= BT_OPT_BACK_TRACE;
>      >
>      > -     while ((c = getopt(argcnt, args,
>     "D:fFI:S:c:aAloreEgstTdxR:Ovp")) != EOF) {
>      > +     while ((c = getopt(argcnt, args,
>     "D:fFI:S:c:n:aAloreEgstTdxR:Ovp")) != EOF) {
>      >                   switch (c)
>      >               {
>      >               case 'f':
>      > @@ -2672,6 +2672,11 @@ cmd_bt(void)
>      >                       active++;
>      >                       break;
>      >
>      > +             case 'n':
>      > +                     if (machine_type("X86_64") && STREQ(optarg,
>     "idle"))
>      > +                             bt->flags |= BT_SKIP_IDLE;
> 
> 
> Can you help add the following changes? That helps to print the correct 

OK, I will add this in the patch v3.

> information when using the "bt -n idle" command on unsupported 
> architectures.
> 
> +                       else
> +                               option_not_supported(c);
> 
> With the above changes:
> 
>        KERNEL: vmlinux
>      DUMPFILE: /var/crash/127.0.0.1-2022-05-24-03:55:17/vmcore  [PARTIAL 
> DUMP]
>          CPUS: 8
>          DATE: Tue May 24 03:54:59 EDT 2022
>        UPTIME: 00:35:41
> LOAD AVERAGE: 0.54, 0.95, 1.28
>         TASKS: 175
> ...
>       MACHINE: ppc64le  (2900 Mhz)
>        MEMORY: 8 GB
>         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>           PID: 16173
>       COMMAND: "bash"
>          TASK: c0000000e9579000  [THREAD_INFO: c0000000e9579000]
>           CPU: 0
>         STATE: TASK_RUNNING (PANIC)
> 
> crash> bt -n idle
> bt: -n option not supported or applicable on this architecture or kernel
> crash>
> 
> Otherwise, for the v2 patchset:
> Acked-by: Lianbo Jiang <lijiang at redhat.com <mailto:lijiang at redhat.com>>

Thanks,
Qi

> 
> Thanks.
> Lianbo
> 
>      > +                     break;
>      > +
>      >               case 'r':
>      >                       bt->flags |= BT_RAW;
>      >                       break;
>      > @@ -3092,6 +3097,10 @@ back_trace(struct bt_info *bt)
>      >       } else
>      >                   machdep->get_stack_frame(bt, &eip, &esp);
>      >
>      > +     /* skip idle task stack */
>      > +     if (bt->flags & BT_SKIP_IDLE)
>      > +             return;
>      > +
>      >       if (bt->flags & BT_KSTACKP) {
>      >               bt->stkptr = esp;
>      >               return;
>      > diff --git a/x86_64.c b/x86_64.c
>      > index ecaefd2..cfafbcc 100644
>      > --- a/x86_64.c
>      > +++ b/x86_64.c
>      > @@ -4918,6 +4918,9 @@ x86_64_get_stack_frame(struct bt_info *bt,
>     ulong *pcp, ulong *spp)
>      >       if (bt->flags & BT_DUMPFILE_SEARCH)
>      >               return x86_64_get_dumpfile_stack_frame(bt, pcp, spp);
>      >
>      > +     if (bt->flags & BT_SKIP_IDLE)
>      > +             bt->flags &= ~BT_SKIP_IDLE;
>      > +
>      >           if (pcp)
>      >                   *pcp = x86_64_get_pc(bt);
>      >           if (spp)
>      > @@ -4960,6 +4963,9 @@ x86_64_get_dumpfile_stack_frame(struct
>     bt_info *bt_in, ulong *rip, ulong *rsp)
>      >       estack = -1;
>      >       panic = FALSE;
>      >
>      > +     if (bt_in->flags & BT_SKIP_IDLE)
>      > +             bt_in->flags &= ~BT_SKIP_IDLE;
>      > +
>      >       panic_task = tt->panic_task == bt->task ? TRUE : FALSE;
>      >
>      >       if (panic_task && bt->machdep) {
>      > @@ -5098,6 +5104,8 @@ next_sysrq:
>      >                   if (!panic_task && STREQ(sym,
>     "crash_nmi_callback")) {
>      >                           *rip = *up;
>      >                           *rsp = bt->stackbase + ((char *)(up) -
>     bt->stackbuf);
>      > +                     if ((bt->flags & BT_SKIP_IDLE) &&
>     is_idle_thread(bt->task))
>      > +                             bt_in->flags |= BT_SKIP_IDLE;
>      >                           return;
>      >                   }
>      > 
> 

-- 
Thanks,
Qi



More information about the Crash-utility mailing list