[Crash-utility] Questions on multi-thread for crash
Tao Liu
ltao at redhat.com
Fri Feb 10 09:04:36 UTC 2023
On Fri, Feb 10, 2023 at 12:42:36AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Tao,
>
> On 2023/02/08 13:34, Tao Liu wrote:
> > Hello,
> >
> > Recently I made an attempt to introduce a thread pool for crash utility, to
> > optimize the performance of crash.
>
> Thank you for the attempt, interesting.
> What data made you try to speed up the collection of member offsets?
> First I'm interested in which routines weigh with crash startup.
>
> To be honest, personally I'm fairly satisfied with the current
> crash-8 startup time :) with commit cd8954023b (thanks to Hatayama-san)
> and maybe the benefit of GDB's parallel loading, which Andrew said.
>
> [root at rhel91u ~]# time echo quit | crash > /dev/null
>
> real 0m2.621s
> user 0m2.574s
> sys 0m0.112s
>
> So I would like to know first whether it's likely to worth looking
> into the slow part, especially if we have such complexity of
> multi-threading or another way.
>
> Thanks,
> Kazu
>
Hi Kazu,
The startup time of crash and drgn comparison:
$ time echo "quit()" | drgn -c /tmp/maple/vmcore -s /tmp/maple/vmlinux > /dev/null
drgn 0.0.22+2.g33c3e36 (using Python 3.10.0, elfutils 0.185, with libkdumpfile)
For help, type help(drgn).
>>> import drgn
>>> from drgn import NULL, Object, cast, container_of, execscript, offsetof, reinterpret, sizeof
>>> from drgn.helpers.common import *
>>> from drgn.helpers.linux import *
warning: could not get debugging information for:
kernel modules (could not read depmod: open: /lib/modules/5.19.0-uek+/modules.dep.bin:
No such file or directory)
real 0m0.222s
user 0m1.032s
sys 0m0.041s
$ time echo quit | crash /tmp/maple/vmcore /tmp/maple/vmlinux > /dev/null
real 0m8.266s
user 0m7.853s
sys 0m0.248s
As you see, there is a startup time difference between crash and drgn. I haven't
look into drgn's source code to see if drgn have the same amount of startup work
as crash. Then I measured the time consuming status of xx_init() functions of
crash, and gprof the time status:
$ crash /tmp/maple/vmcore /tmp/maple/vmlinux
gdb_session_init 23.761000
machdep_init POST_RELOC 0.060000
show_untrusted_files 0.000000
kdump_backup_region_init 0.001000
read_in_kernel_config 1.287000
kernel_init 1724.044000 <<--- a lot of symbols resolving
machdep_init POST_GDB 851.425000
vm_init 1993.168000 <<--- a lot of symbols resolving
machdep_init POST_VM 414.680000
module_init 287.185000
help_init 0.005000
task_init 459.189000
vfs_init 301.908000
net_init 116.945000
dev_init 6.494000
machdep_init POST_INIT 9.176000
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
14.47 1.12 1.12 skip_ws(char const*&, char const*&, char const*)
14.08 2.21 1.09 lookup_partial_symbol(objfile*, partial_symtab*, lookup_name_info const&, int, domain_enum_tag)
13.44 3.25 1.04 strncmp_iw_with_mode(char const*, char const*, unsigned long, strncmp_iw_mode, language, completion_match_for_lcd*)
5.81 3.70 0.45 symbol_matches_search_name(general_symbol_info const*, lookup_name_info const&)
4.52 4.05 0.35 gdb::bcache::insert(void const*, int, bool*)
4.26 4.38 0.33 default_symbol_name_matcher(char const*, lookup_name_info const&, completion_match_result*)
4.26 4.71 0.33 language_defn::get_symbol_name_matcher(lookup_name_info const&) const
3.23 4.96 0.25 symbol_matches_domain(language, domain_enum_tag, domain_enum_tag)
2.84 5.18 0.22 1 220.00 220.00 symval_hash_init
2.20 5.35 0.17 read_attribute_value(die_reader_specs const*, attribute*, unsigned int, long, unsigned char const*, bool*)
2.20 5.52 0.17 iterative_hash
1.81 5.66 0.14 strcmp_iw_ordered(char const*, char const*)
1.81 5.80 0.14 psym_lookup_symbol(objfile*, block_enum, char const*, domain_enum_tag)
1.68 5.93 0.13 dwarf2_psymtab::readin_p(objfile*) const
1.55 6.05 0.12 133543 0.00 0.00 symname_hash_install
...
It looks to me that crash takes a lot of time on symbol resolving, so I thought
maybe I can parallel some of those to shorten the startup time.
Thanks,
Tao Liu
> >
> > One obvious point which can benefit from multi-threading is memory.c:vm_init().
> > There are hundreds of MEMBER_OFFSET_INIT() related symbol resolving functions,
> > and most of the symbols are independent from each other, by careful arrangement,
> > they can be invoked parallelly. By doing so, we can shorten the waiting time of
> > crash starting.
> >
> > The implementation is abstracted as the following:
> >
> > Before multi-threading:
> > MEMBER_OFFSET_INIT(task_struct_mm, "task_struct", "mm");
> > MEMBER_OFFSET_INIT(mm_struct_mmap, "mm_struct", "mmap");
> >
> > After multi-threading:
> > create_threadpool(&pool, 3);
> > ...
> > MEMBER_OFFSET_INIT_PARA(pool, task_struct_mm, "task_struct", "mm");
> > MEMBER_OFFSET_INIT_PARA(pool, mm_struct_mmap, "mm_struct", "mmap");
> > ...
> > wait_and_destroy_threadpool(pool);
> >
> > MEMBER_OFFSET_INIT_PARA just append the task to the work queue of thread pool
> > and continues, it's up to the pool to schedule the worker thread to do the
> > symbol resolving work.
> >
> > However, after enable multi-threading, I noticed there are always random errors
> > from gdb. From segfault to broken stack, it seems gdb is not thread safe at
> > all...
> >
> > For example one error listed as follows:
> >
> > Thread 10 "crash" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7fffc4f00640 (LWP 72950)]
> > c_yylex () at /sources/up-crash/gdb-10.2/gdb/c-exp.y:3250
> > 3250 if (pstate->language ()->la_language != language_cplus
> > (gdb) bt
> > #0 c_yylex () at /sources/up-crash/gdb-10.2/gdb/c-exp.y:3250
> > #1 c_yyparse () at /sources/up-crash/gdb-10.2/gdb/c-exp.c.tmp:2092
> > #2 0x00000000006f62d7 in c_parse (par_state=<optimized out>) at /sources/
> > up-crash/gdb-10.2/gdb/c-exp.y:3414
> > #3 0x0000000000894eac in parse_exp_in_context (stringptr=0x7fffc4efeff8,
> > pc=<optimized out>, block=<optimized out>, comma=0, out_subexp=0x0,
> > tracker=0x7fffc4efef10, cstate=0x0, void_context_p=0) at parse.c:1122
> > #4 0x00000000008951d6 in parse_exp_1 (tracker=0x0, comma=0, block=0x0,
> > pc=0, stringptr=0x7fffc4efeff8) at parse.c:1031
> > #5 parse_expression (string=<optimized out>, string at entry=0x7fffc4eff140
> > "slab_s", tracker=tracker at entry=0x0) at parse.c:1166
> > #6 0x000000000092039a in gdb_get_datatype (req=0x7fffc4eff720) at symtab.c:7239
> > #7 gdb_command_funnel_1 (req=0x7fffc4eff720) at symtab.c:7018
> > #8 0x00000000009206de in gdb_command_funnel (req=0x7fffc4eff720) at symtab.c:6956
> > #9 0x00000000005ad137 in gdb_interface (req=0x7fffc4eff720) at gdb_interface.c:409
> > #10 0x00000000005fe76c in datatype_info (name=0xab9700 "slab_s",
> > member=0xaba8d8 "list", dm=0x0) at symbols.c:5708
> > #11 0x0000000000517a85 in member_offset_init_slab_s_list_slab_s_list ()
> > at memory.c:659
> > #12 0x000000000068168f in group_routine (args=<optimized out>) at thpool.c:81
> > #13 0x00007ffff7a48b17 in start_thread () from /lib64/libc.so.6
> > #14 0x00007ffff7acd6c0 in clone3 () from /lib64/libc.so.6
> > (gdb) p pstate
> > $1 = (parser_state *) 0x0
> >
> > $ cat -n /sources/up-crash/gdb-10.2/gdb/c-exp.y
> > 66 /* The state of the parser, used internally when we are parsing the
> > 67 expression. */
> > 68
> > 69 static struct parser_state *pstate = NULL;
> >
> > pstate is a global variable and not thread safe, the value must be changed by
> > someone else...
> >
> > Now the project has reached a dead end. Because making gdb thread safe is an
> > impossible mission to me. Is there any advice or suggestions? Thanks in advance!
> >
> > Thanks!
> > Tao Liu
> >
> > --
> > Crash-utility mailing list
> > Crash-utility at redhat.com
> > https://listman.redhat.com/mailman/listinfo/crash-utility
> > Contribution Guidelines: https://github.com/crash-utility/crash/wiki
More information about the Crash-utility
mailing list