[Crash-utility] Re: crash enhancements proposal

Fri May 5 14:03:54 UTC 2006

Maneesh Soni wrote:

> Hi Dave,
>
> Following is a list of a few proposed improvements to crash utility though
> for most of the items there are no names associated.
>
> Please let us know if these look useful or not. And if found appropriate
> would it be possible for you to merge these with the crash todo list.
>
> Thanks to Badari Pulavarty, Richard Moore and Vara Prasad for the inputs.
>
> Regards
> Maneesh
>
> --------------------------------------------------------------------------------
> DESCRIPTION:
>    clean & correct stack back traces on platforms ALL the time.
>        - x86_64 (currently wrong and need fixing)
>        - frame pointers off ? (on x86 we still don't have frame pointers on)
>
> RESOLUTION STATUS: Work-in-progress by Rachita Kothiyal <rachita at in.ibm.com>

Certainly a welcome task.  I suggest segregating the code in a separate
file (as done with lkcd_x86_trace.c), and the new entry point can simply
be plugged into machdep->back_trace function pointer at init time.
There should also be an "out" to allow it to be set back to use the
current x86_64_low_budget_back_trace_cmd().  Also, if it doesn't
support -fomit-frame-pointer, it's not worth doing.

>
>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>     Code restructuring:
>     - move as much code for advanced commands to libraries so that
>       crash is at least able to open the dump image and perform minimal
>       set of commands like bt, dump dmesg log, disassemble etc. irrespective
>       of kernel version.
>     - code is hard to read & understand - need to re-write some of the
>       basic subsystems like memory mapping, pagetable management etc
>
> RESOLUTION STATUS:
>         Work-in-progress by Dave Wilder <dwilder at us.ibm.com> and
>         Maneesh Soni <maneesh at in.ibm.com>
>

I don't quite understand how moving code to libraries is going to
achieve the goal here.  Things in some of the various *_init() functions
could certainly be streamlined (or skipped) in order to make it more
likely to make it to the first prompt.  For example, the task table initialization
could be made to simply fill in the context data for just the panic task.
(But it almost sounds like you just want to use gdb alone for the minimal
set of commands you've listed?)

As far as "re-writes" are concerned, please keep in mind the
necessity of backwards-compatibility.  I'd much rather keep the current
code -- that's known to work -- in place, and if you come up with
something new, or re-shuffled, make it only callable when the kernel
is of a known kernel version or later.

The point is, let's not just re-invent the wheel just for purpose of
re-inventing the wheel.

>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>     Crash & kernel version independence:
>     kernel headers & code - reuse ? It would be nice to figure
>     out a way to include kernel headers and sections of kernel code
>     to do hard stuff (like memory mapping functions page_to_pfn,
>     pfn_to_page, pagetable decoding etc..).
>
> RESOLUTION STATUS:
>         Work-in-progress by Dave Wilder <dwilder at us.ibm.com> and
>         Maneesh Soni <maneesh at in.ibm.com>
>

I don't particularly like this suggestion.  (I thought we just went through
a problem where Ubuntu kernels don't even have kernel headers?)

As far as code reuse, we already do that in a number of places, so
I guess that's OK.

And there is just never seems to be a "one-size-fits-all" set of
kernel functions/macros that covers all bases over the life of
the kernel and each processor type.

But as always, I'm open to suggestion.

>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>    Mini report:
>    The goal of this is to produce a summary report of common information
>    that is used to track problems. The idea here is for many problems we
>    probably don't need to get the whole dump shipped and as you probably
>    figured out by now it is not easy to ship and store these huge dump
>    files.
>
> RESOLUTION STATUS: TBD
>

Not a bad idea...

>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>    Automatic verification of the dump:
>    When you get a dump to look at problem there are few common tasks one
>    performs, the idea here is to automate those tasks and provide a simple
>    interface in the tool. Another possibility is automatic verification of
>    important datastructures, for example if the task list says there are
>    30 tasks this feature automatically walks the list and counts to verify
>    if there are 30 in the list or not, if 30 entries or not found this may
>    give a clue of some kind of a corruption.
>
> RESOLUTION STATUS: TBD
>

OK.  Often this gets recognized already, or if things are horribly corrupted,
the session won't even come up.

>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>    function arguments:
>    Display arguments in the stack trace. At present, we do not have support
>    for PPC64 and x86_64. On PPC64, user can dump retrieve only for top
>    level frame from pt_regs. However, user can dump complete stack frame
>    and read arguments. So, it is manual process and need to have some
>    expertise on the stack frame
>
> RESOLUTION STATUS: TBD
>

Have at it...  Given the x86_64 usage of registers for passing args,
good luck.

>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>    local variables:
>    Facilitate possible display of local variables with stack frames
>    Since we are using debug vmlinux, we can find local variables locations
>    from Dwarf2.
>
> RESOLUTION STATUS: TBD
>

Again, I guess it might be nice.

>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>    better assembly & source languge, line# display in disassembly
>
> RESOLUTION STATUS: TBD

Talk to gdb -- that's where it all comes from...  For any text address,
gdb has the associated line number data.  It often looks confusing because
the text comes from a header macro or inline or whatever.  I don't know
what you can do about that.

>
>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>    per-cpu info (like stacks traces)
>
> RESOLUTION STATUS: TBD

Needs more of a description...

>
> --------------------------------------------------------------------------------
> DESCRIPTION:
>      User space enhancements
>      - show user space stack backtrace, if present in the dump file,
>      - ability to link user space namelist (debug object files),
>
> RESOLUTION STATUS: TBD
>

I thought crash was a kernel [crash/live-system] analyzer?

You currently can add user-space debug data with "add-symbol-file",
which loads the debug data and symbols into gdb.  I have done this
kind of thing, but it's been an "almost-never" kind of situation, where
I've wanted to display a user program's data structure.

But if you want to start throwing in this kind of user-space stuff,
please just keep it segregated.

>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>     Platform specific enhancements

>     - Establish CPU registers at the time of exceptions in the current context
>     - Ability to handle CPU registers from current context using symbols in
>       expressions
>     - Ability to format basic processor structures like LDT, GDT, task gates
>       for x86 arch
>

Not clear on what "establishing" CPU registers means.   We already
dump exception frames.

I guess you mean to be able to use a register connotation in certain
commands, as opposed to the address contained in the register?
That's potentially messy, because it puts processor-specific stuff
in processor-neutral code.

As far as the LDT, GDB, task gates formatting, that's fine.

>
> RESOLUTION STATUS: TBD
>
> --------------------------------------------------------------------------------
>
> DESCRIPTION:
>      cross architecture support for crash
>
> RESOLUTION STATUS: TBD

No way -- we've been through this before.  It is essentially a complete re-write.

If you want this, make a new command entirely.

>
>
> --------------------------------------------------------------------------------

I've made my personal feelings on these kinds of things before,
which is to take a "minimalist" approach.  Every new bell and whistle
is virtually guaranteed to break as the kernel churns.  And they all
require an additional support burden.  If I had my druthers, crash
would have less rather than more at this point.

But I understand that this has become a community project, and
with the few exceptions above, I'm open to all patch suggestions.

Thanks,
  Dave