[RFC] [Crash-utility] Patch to use gdb's bt in crash - works great with kgdb!

Wed Aug 30 19:35:21 UTC 2006

On Thu, 2006-08-24 at 09:15 -0400, Dave Anderson wrote:

Morning Dave:

> Rachita Kothiyal wrote:
> 
> > Hi Dave
> >
> > I was trying to implement better backtrace mechanism for crash using
> > dwarf info. And was trying to use the embedded gdb itself as gdb
> > already uses dwarf information for unwinding stack. I could get
> > "gdb bt" command working in "crash" after making one minor bug
> > fix in gdb_interface.c (Patch appended). Now one can get cleaner
> > backtrace particularly in x86_64 case using "gdb bt" command.
> >
> 
> Wow -- your definition of "cleaner" apparently is different than mine...  ;-)
> 
> >
> > crash> bt
> > PID: 4146   TASK: ffff81022e848af0  CPU: 0   COMMAND: "insmod"
> >  #0 [ffff81021efadbf8] crash_kexec at ffffffff801521d1
> >  #1 [ffff81021efadc40] machine_kexec at ffffffff8011a739
> >  #2 [ffff81021efadc80] crash_kexec at ffffffff801521ed
> >  #3 [ffff81021efadd08] crash_kexec at ffffffff801521d1
> >  #4 [ffff81021efadd30] bust_spinlocks at ffffffff8011fd6d
> >  #5 [ffff81021efadd40] panic at ffffffff80131422
> >  #6 [ffff81021efadda0] cond_resched at ffffffff804176c3
> >  #7 [ffff81021efaddb0] wait_for_completion at ffffffff80417701
> >  #8 [ffff81021efade00] __down_read at ffffffff80418d07
> >  #9 [ffff81021efade30] fun2 at ffffffff80107017
> > #10 [ffff81021efade40] fun1 at ffffffff801311b6
> > #11 [ffff81021efade50] init_module at ffffffff8800200f
> > #12 [ffff81021efade60] sys_init_module at ffffffff8014c664
> > #13 [ffff81021efadf00] init_module at ffffffff88002068
> > #14 [ffff81021efadf80] system_call at ffffffff801096da
> >     RIP: 00002b2153382d4a  RSP: 00007fff57900a28  RFLAGS: 00010246
> >     RAX: 00000000000000af  RBX: ffffffff801096da  RCX: 0000000000000000
> >     RDX: 0000000000512010  RSI: 0000000000016d26  RDI: 00002b21531e5010
> >     RBP: 00007fff57900c58   R8: 00002b21534f46d0   R9: 00002b21531fbd36
> >     R10: 0000000000516040  R11: 0000000000000206  R12: 0000000000512010
> >     R13: 00007fff579015c5  R14: 0000000000000000  R15: 00002b21531e5010
> >     ORIG_RAX: 00000000000000af  CS: 0033  SS: 002b
> > crash> gdb bt 15
> > [Switching to thread 1 (process 4146)]#0  0xffffffff801521d1 in crash_kexec (regs=0x0) at kexec.h:64
> > 64      in kexec.h
> > #0  0xffffffff801521d1 in crash_kexec (regs=0x0) at kexec.h:64
> > #1  0xffffffff80131422 in panic (fmt=0xffffffff8044832c "Rachita triggering panic\n") at kernel/panic.c:87
> > #2  0xffffffff80107017 in fun2 (i=0) at init/main.c:608
> > #3  0xffffffff801311b6 in fun1 (j=Variable "j" is not available.
> > ) at kernel/panic.c:278
> > #4  0xffffffff8800200f in ?? ()
> > #5  0xffffc2000023d9d0 in ?? ()
> > #6  0xffffffff8014c664 in sys_init_module (umod=0xffff81022ef6c400, len=18446604445110683424,
> >     uargs=0xffff81022ef6c6e8 "\020304366.\002\201377377x304366.\002\201377377340304366.\002\201377377H305366.\002\201377377260305366.\002\201377377\030306366.\002\201377377\200306366.\002\201377377")
> >     at kernel/module.c:1911
> > #7  0xffffffff801096da in system_call () at bitops.h:230
> > #8  0x00002b2153382d4a in ?? ()
> > #9  0xffff81022e8516d0 in ?? ()
> > #10 0xffffffff8055c7c0 in migration_notifier ()
> > #11 0x0000000000000000 in ?? ()
> > #12 0x0000000000000001 in ?? ()
> > #13 0xffffffffffffffff in ?? ()
> > #14 0xffffffff8013ae2a in recalc_sigpending () at kernel/signal.c:227
> > (More stack frames follow...)
> > crash>
> >
> > ===============================================================================
> >
> > But as of now there are few issues with "gdb bt"
> >
> > 1) Sometimes the no. of stack frames displayed doesn't end for a long time
> >    and also the "q" command doesn't work as desired once the screen is full.
> >    The workaround is to give some limiting count like "gdb bt 10".
> >    I tried gdb ver 6.1 externally (outside crash) also and see the same
> >    long ending stack frames where as the latest gdb (ver 6.4), works fine. So
> >    just wondering if you are planning to upgrade embedded gdb to ver 6.4?
> >
> 
> Not really.  That's a major undertaking with unpredictable results
> until it's attempted.  Every time I do that, nightmares follow, so only
> if we get to the point where gdb-6.1 doesn't work at all, or cripples
> crash's use of it with a new vmlinux, should we even think of doing that.
> 
> 
> >
> > 2) As unlike crash, there is no concept of tasks in gdb, we can only see the
> >    backtraces for tasks active at the time of crash.
> >
> >
> > Apart from "bt" this change also allows to get some other related commands
> > like "gdb info registers", "gdb info frame" and "gdb info threads" working.
> >
> 
> Well, right off the bat, I'm not too keen on passing the vmcore to gdb,
> because I don't know what the unseen ramifications of that would be.
> Even so, you can't just do an "argc++" in gdb_main_loop() because
> that apparently presumes that crash is receiving *only* two arguments,
> in the "vmlinux vmcore" order.  That cannot be presumed obviously,
> as the possible combinations of crash command line options/ordering
> are endless.
> 
> Secondly, until I see something useful in the case where the kernel
> takes an in-kernel exception that in turn causes the crash, I'm
> unconvinced.  What does the trace look like if you take an
> oops or BUG() while running in kernel mode?  Does gdb step
> past that point?  (i.e., to the part of the backtrace we'd actually
> want to see)  Certainly we won't see a register dump at the exact
> point of the exception.  Would it make the jump from the x86_64
> interrupt stack (or any of the exception stacks) back to the
> process stack?
> 
> Given that it only gives backtraces of the active tasks, we're
> still left with a half-baked implementation.
> 
> And now, with the introduction of the new CONFIG_UNWIND_INFO
> and CONFIG_STACK_UNWIND configurations in the x86 and x86_64
> kernels, wouldn't it make more sense to utilize the approach taken by
> the crash-utility/ia64 unwind facility?  Although the x86/x86_64
> implementation still appears to be a work in progress in the kernel,
> backporting that capability from the kernel to user-space would seem
> to be more useful.  That's what was done for ia64, and for that reason
> it's the only architecture where we get dependable backtraces for
> all tasks, active or not.
> 
> Simple question -- and to be quite honest with you -- I don't
> understand why you wouldn't want to simply use gdb alone
> in this case?

I don't see any reason for core file not to be read correctly by
gdb. It's convenient to use gdb directly sometimes, for example
while using the ddd GUI.

kgdb isn't having any problems with kernel threads back traces.
The kernel objects are tweaked with dwarf code, but I see no
problem with using the same paradigm with crash. Works great.

I'd prefer to have crash and ddd+gdb operate on kernel core files.

Even better it would be nice to be able to simulate execution on
a stack of a core file to be able to re-execute code that caused
the crash. I frequently found it convenient after a panic to move
the pc to the end of panic, and continue back up the stack to a 
break point at the system call. Then I'd use the GUI to move the
pc to before the execution of the system call and execute it again
and watch how the return value was derived that caused the panic.

I expect that if you run a kgdb kernel, including the drarf code,
that gdb will have no problem with core dumps. It's convenient to
have kgdb configured in the kernel and have the option to continue
analysis later with gdb/crash.

-piet

> 
> Dave
> 
> 
> >
> > Please let me know your comments about the approach.
> >
> > Thanks
> > Rachita
> >
> >   o This patch fixes the broken crash-gdb interface for running gdb commands
> >     in crash. Earlier the "argc" was not right and the commands to gdb used to
> >     fail. The patch increments the argc correctly.
> >
> >   o This will particularly help in using the gdb backtrace mechanism for
> >     active tasks at the point of crash. We now implicitly change the "thread"
> >     id for gdb to the current context cpu before issuing the "bt" command to
> >     embedded gdb.
> >
> > Signed-off-by: Rachita Kothiyal <rachita at in.ibm.com>
> > ---
> >
> >  gdb_interface.c |   16 ++++++++++++++--
> >  1 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff -puN gdb_interface.c~crash_use_gdb_bt gdb_interface.c
> > --- crash-4.0-3.1/gdb_interface.c~crash_use_gdb_bt      2006-08-24 15:21:22.527601072 +0530
> > +++ crash-4.0-3.1-rachita/gdb_interface.c       2006-08-24 15:51:51.314583192 +0530
> > @@ -58,6 +58,7 @@ gdb_main_loop(int argc, char **argv)
> >
> >          optind = 0;
> >          command_loop_hook = main_loop;
> > +       argc ++;
> >
> >  #if defined(GDB_5_3) || defined(GDB_6_0) || defined(GDB_6_1)
> >          gdb_main_entry(argc, argv);
> > @@ -670,6 +671,7 @@ void
> >  cmd_gdb(void)
> >  {
> >          int c;
> > +       char *thread_cmd;
> >
> >          while ((c = getopt(argcnt, args, "")) != EOF) {
> >                  switch(c)
> > @@ -703,6 +705,18 @@ cmd_gdb(void)
> >                     whitespace(pc->orig_line[3]))
> >                         shift_string_left(pc->orig_line, strlen("gdb")+1);
> >
> > +               /*
> > +                * If the request is a backtrace then switch context to
> > +                *  current context in gdb too
> > +                */
> > +               if ((strncmp(clean_line(pc->orig_line), "backtrace", 9) == 0) ||
> > +                   (strncmp(clean_line(pc->orig_line), "bt", 2) == 0)) {
> > +                       thread_cmd = GETBUF(BUFSIZE);
> > +                       sprintf(thread_cmd, "thread %d",
> > +                               CURRENT_CONTEXT()->processor + 1);
> > +                       gdb_pass_through(thread_cmd, NULL, 0);
> > +                       FREEBUF(thread_cmd);
> > +               }
> >                 gdb_pass_through(clean_line(pc->orig_line), NULL, 0);
> >         }
> >  }
> > @@ -836,5 +850,3 @@ get_frame_offset(ulong pc)
> >             "get_frame_offset: invalid request for non-alpha systems!\n"));
> >  }
> >  #endif /* !ALPHA */
> > -
> > -
> > _
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
-- 
Piet Delaney
BlueLane Teck
W: (408) 200-5256; piet at bluelane.com
H: (408) 243-8872; piet at piet.net