[Crash-utility] Re: [PATCH 0/3] Display local variables & function parameters from stack frames

Wed May 27 13:06:23 UTC 2009

----- "Sharyathi Nagesh" <sharyath at in.ibm.com> wrote:

> Dave
>    Excuse me for overlooking this part of the code, I am attaching a fix 
> to this, hope this fixes the issue.

That one looks better...

> Dave I have few observations regarding the points you have raised 
> mkdumpfile -c stripping the ELF Note for pt_regs: 
> mkdumpfile won't be saving much by stripping ELF Notes of pt_regs
> information. It will be ~256 bytes * number of cpus which is not much.
> We will discuss with mkdumpfile developers to check out the possibility 
> of retaining this ELF Note information.
>
> Regarding CONFIG_FRAMEPOINTER
> We understand this is disabled so as to release one more 
> register,bp, for general purpose operations and this is default.
> Ideally this information should have got saved in dwarf section, so 
> theoretically speaking we should be able to unwind the x86/x86_64 dump
> even with out CONFIG_FRAMEPOINTER. But some how the stack unwinding is
> not as direct as it is in ppc64 we are re-looking into this
> implementation.
> 
> Regarding Exception Frame on the top of the stack frame
> As we understand if we have the pt_regs of the topmost stack we 
> should be able to unwind to the next stack frame, even if top most
> stack frame is an exception frame, atleast in ppc64. We are not sure
> of x86 and x86_64 we can relook into that too.

I understand that with the topmost pt_regs you can then start the
backtrace OK.  That's not what I'm referring to.  

What I'm talking about is bumping into another exception frame
while unwinding from the topmost pt_regs.  Or what happens
when the crash occurs while operating on an alternate kernel
stack.  

Just take a simple example -- what happens when you actually enter
"alt-sysrq-c" on an x86_64, generating a keyboard interrupt and therefore
a transition to the per-cpu IRQ stack?  By the time crash-kexec()
is called, you're already operating on the IRQ stack.  Your
code will work its way back to the top of the per-cpu IRQ stack,
but then what does it do?  Or suppose the task takes a page fault,
lays down an exception frame, and then later BUG()'s out while attempting
the handle the fault.  Your code will start from the most-recently
occurring exception frame, but will bump into the page fault exception
during the unwind operation.  Does you code properly recognize the new
exception frame (and the passage through assembly-language code
when that happens)?

All I'm saying is basing your test results simply on instances where
panic() is called or "echo c ..." was entered is the most trivial type
of kernel crash -- because there's no kernel exception frame
laid down until crash_kexec() gets called.  That's a fairly rare
occurance w/respect to typical kernel crashes.

Dave