[Crash-utility] Crash in crash

Dave Anderson anderson at redhat.com
Thu Oct 23 13:01:26 UTC 2014



----- Original Message -----
> Fine. If you remove that message then I see no problems with your patch.
> 
> Jan
> 

OK good -- queued for crash-7.0.9:
 
  https://github.com/crash-utility/crash/commit/187cb0c09a854eee8f05e91e131490150c442a7a

Thanks,
  Dave

> Jan Karlsson
> Senior Software Engineer
> System Assurance
> 
> Sony Mobile Communications
> Tel: +46 703 062 174
> jan.karlsson at sonymobile.com
> 
> sonymobile.com
> 
> 
> 
> -----Original Message-----
> From: crash-utility-bounces at redhat.com
> [mailto:crash-utility-bounces at redhat.com] On Behalf Of Dave Anderson
> Sent: den 22 oktober 2014 15:02
> To: Discussion list for crash utility usage, maintenance and development
> Subject: Re: [Crash-utility] Crash in crash
> 
> 
> 
> ----- Original Message -----
> > Hi
> > 
> > Your patch works but I get a "strange" error message:
> > 
> > please wait... (determining panic task)
> > bt: bsearch for tgid failed: task: ffffffc01cfed400 tgid: 5040
> > 
> >       KERNEL: vmlinux
> >     DUMPFILE: vmcore
> >     ....
> > 
> > This message does not occur with my patch.
> > 
> > Jan
> 
> Yeah, that message will be removed in crash-7.0.9:
> 
>   https://github.com/crash-utility/crash/commit/a3a441aeabd6c5c3c86b4793a283927507a5cc10
> 
> The point of the matter is to entirely avoid doing initial sort, and then
> doing the RSS gathering and associated readmem()'s for all tasks during the
> last-ditch panic-task search that your dumpfile requires.
> 
> Dave
> 
>   
> 
> > 
> > Jan Karlsson
> > Senior Software Engineer
> > System Assurance
> > 
> > Sony Mobile Communications
> > Tel: +46 703 062 174
> > jan.karlsson at sonymobile.com
> > 
> > sonymobile.com
> > 
> > 
> > 
> > -----Original Message-----
> > From: crash-utility-bounces at redhat.com
> > [mailto:crash-utility-bounces at redhat.com] On Behalf Of Dave Anderson
> > Sent: den 21 oktober 2014 16:32
> > To: Discussion list for crash utility usage, maintenance and
> > development
> > Subject: Re: [Crash-utility] Crash in crash
> > 
> > 
> > Hi Jan,
> > 
> > Good catch.  As far as a fix goes, it would be more efficient if
> > tgid_quick_search() just returns a NULL in that case.  Try the
> > attached patch.
> > 
> > Thanks,
> >   Dave
> > 
> > 
> > 
> > 
> > ----- Original Message -----
> > > 
> > > 
> > > Hi Dave
> > > 
> > > 
> > > 
> > > I have a vmcore file for ARM64 that crashes Crash during startup.
> > > The core file is created at a hardware watchdog (I believe) so there
> > > is no panic message or something similar in the log.
> > > 
> > > 
> > > 
> > > This is the printout from Crash running under gdb, after the
> > > copyrights and config information:
> > > 
> > > 
> > > 
> > > please wait... (determining panic task)
> > > 
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 
> > > 0x000000000047ed40 in tgid_quick_search (tgid=5040) at memory.c:4114
> > > 
> > > 4114 if (tgid == last->tgid) {
> > > 
> > > 
> > > 
> > > (gdb) bt
> > > 
> > > #0 0x000000000047ed40 in tgid_quick_search (tgid=5040) at
> > > memory.c:4114
> > > 
> > > #1 0x000000000047f046 in get_task_mem_usage
> > > (task=18446743799318107136,
> > > tm=0x7fffffff6f40)
> > > 
> > > at memory.c:4186
> > > 
> > > #2 0x000000000047c679 in vm_area_dump (task=18446743799318107136,
> > > flag=10, vaddr=0, ref=0x0)
> > > 
> > > at memory.c:3671
> > > 
> > > #3 0x000000000047ec08 in in_user_stack (task=18446743799318107136,
> > > vaddr=0) at memory.c:4063
> > > 
> > > #4 0x00000000004fd9fe in arm64_get_dumpfile_stackframe
> > > (frame=<synthetic
> > > pointer>,
> > > 
> > > bt=<optimized out>) at arm64.c:1077
> > > 
> > > #5 arm64_get_stack_frame (bt=0x7fffffffc690, pcp=0x7fffffff9560,
> > > spp=0x7fffffff9568)
> > > 
> > > at arm64.c:1103
> > > 
> > > #6 0x00000000004de409 in back_trace (bt=0x7fffffffc690) at
> > > kernel.c:2533
> > > 
> > > #7 0x00000000004d1563 in foreach (fd=0x7fffffffc7c0) at task.c:6161
> > > 
> > > #8 0x00000000004d2bbd in panic_search () at task.c:6425
> > > 
> > > #9 0x00000000004d4454 in get_panic_context () at task.c:5364
> > > 
> > > #10 task_init () at task.c:491
> > > 
> > > #11 0x000000000046146e in main_loop () at main.c:801
> > > 
> > > #12 0x00000000006467a3 in captured_command_loop (data=<optimized
> > > out>) at
> > > main.c:258
> > > 
> > > #13 0x000000000064535b in catch_errors (func=0x646790
> > > <captured_command_loop>, func_args=0x0,
> > > 
> > > errstring=0x873235 "", mask=6) at exceptions.c:557
> > > 
> > > #14 0x0000000000647726 in captured_main (data=<optimized out>) at
> > > main.c:1064
> > > 
> > > #15 0x000000000064535b in catch_errors (func=0x646aa0
> > > <captured_main>, func_args=0x7fffffffe030,
> > > 
> > > errstring=0x873235 "", mask=6) at exceptions.c:557
> > > 
> > > #16 0x0000000000647a84 in gdb_main (args=<optimized out>) at
> > > main.c:1079
> > > 
> > > #17 0x0000000000647abe in gdb_main_entry (argc=<optimized out>,
> > > argv=<optimized out>)
> > > 
> > > at main.c:1099
> > > 
> > > #18 0x000000000045f61f in main (argc=3, argv=0x7fffffffe188) at
> > > main.c:758
> > > 
> > > 
> > > 
> > > (gdb) p tt->last_tgid
> > > 
> > > $1 = (struct tgid_context *) 0x0
> > > 
> > > 
> > > 
> > > Source code for tgid_quick_search:
> > > 
> > > static struct tgid_context *
> > > 
> > > tgid_quick_search(ulong tgid)
> > > 
> > > {
> > > 
> > > struct tgid_context *last, *next;
> > > 
> > > 
> > > 
> > > tt->tgid_searches++;
> > > 
> > > 
> > > 
> > > last = tt->last_tgid;
> > > 
> > > if (tgid == last->tgid) {
> > > 
> > > tt->tgid_cache_hits++;
> > > 
> > > return last;
> > > 
> > > }
> > > 
> > > ....
> > > 
> > > }
> > > 
> > > 
> > > 
> > > So 'last' becomes 0 which causes the crash.
> > > 
> > > 
> > > 
> > > After some more investigation I have seen that "tt->last_tgid" is
> > > initialized in function sort_tgid_array in task.c, but that function
> > > seems to be called at a later stage.
> > > 
> > > 
> > > 
> > > By adding a line in tgid_quick_search:
> > > 
> > > 
> > > 
> > > static struct tgid_context *
> > > 
> > > tgid_quick_search(ulong tgid)
> > > 
> > > {
> > > 
> > > struct tgid_context *last, *next;
> > > 
> > > 
> > > 
> > > tt->tgid_searches++;
> > > 
> > > 
> > > 
> > > if (tt->last_tgid == 0) sort_tgid_array(); // added line
> > > 
> > > last = tt->last_tgid;
> > > 
> > > if (tgid == last->tgid) {
> > > 
> > > tt->tgid_cache_hits++;
> > > 
> > > return last;
> > > 
> > > }
> > > 
> > > ...
> > > 
> > > 
> > > 
> > > I can run Crash on this core file. However I do not know if this is
> > > the best way to fix the problem.
> > > 
> > > 
> > > 
> > > Jan
> > > 
> > > 
> > > 
> > > Jan Karlsson
> > > 
> > > Senior Software Engineer
> > > 
> > > System Assurance
> > > 
> > > 
> > > 
> > > Sony Mobile Communications
> > > 
> > > Tel: +46 703 062 174
> > > 
> > > jan.karlsson at sonymobile.com
> > > 
> > > 
> > > 
> > > sonymobile.com
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > --
> > > Crash-utility mailing list
> > > Crash-utility at redhat.com
> > > https://www.redhat.com/mailman/listinfo/crash-utility
> > 
> > --
> > Crash-utility mailing list
> > Crash-utility at redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
> > 
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 




More information about the Crash-utility mailing list