[Crash-utility] handling missing kdump pages in diskdump format

Mon Apr 2 15:59:47 UTC 2007

Dave Anderson wrote:

> Bob Montgomery wrote:
>
> > On Thu, 2007-03-29 at 08:13 -0500, Dave Anderson wrote:
> > > Ken'ichi Ohmichi wrote:
> >
> > > > I checked whether this change is correct by the following:
> > > > (The following patches are attached with this mail)
> > > > - makedumpfile-1.1.2 with "point_same_zero_page2.patch" creates a dumpfile.
> > > > - crash-4.0-3.21 with "not-access-excluded-page.patch" analyzes the dumpfile.
> > > > - The analysis result of the dumpfile is compared with /proc/vmcore's.
> > > >
> > > > And on i386 linux-2.6.19, I found the difference between the result
> > > > of the dumpfile (excluding free pages) and /proc/vmcore's by subcommand
> > > > "foreach bt".
> > > > But by using crash-4.0-3.21 without "not-access-excluded-page.patch",
> > > > there is not any difference. In a word, this difference happens due to
> > > > considering the excluded pages as unaccess pages.
> >
> > Just to clarify for those who probably aren't as confused as I was at
> > first:
> >
> > This isn't a test of the zero page trick, because with the changes to
> > makedumpfile, zero pages are no longer actually excluded.  (I read
> > "excluding free pages" but immediately thought "excluding zero pages"
> > and spent more than a few minutes checking how that could possibly have
> > happened.)
> >
> > So this is apparently a case where a page excluded because it was
> > supposedly free is then maybe accessed by the back tracer while it might
> > be trying to read kernel text, right?  But kernel text should never look
> > free, so I'm still puzzled.  Did makedumpfile mis-identify a real page
> > as free, or is crash asking for pages it shouldn't be looking at during
> > backtrace?
> >
>
> No -- it's kernel text that was marked as __init, so the page containing
> it got freed and reallocated as a page that was purposely excluded.
> The page originally contained the "start_kernel" __init function, which
> only gets executed once by the first swapper thread.
>
> The problem is that crash shouldn't be looking at that text location
> when doing a backtrace on that PID 0, because it should have
> stopped the trace as soon as it saw the "cpu_idle" stack reference.
> I don't know why it's doing that -- I tried simulating Ken'ichi's vmcore
> by forcibly returning an error if readmem() got a request for the
> page originally containing "start_kernel", but the backtrace worked
> OK -- even though I could see the "start_kernel" reference on
> the stack when using "bt -t".
>
> Anyway, that's why I've asked Ken'ichi if he can make his
> vmlinux/vmcore pair available for me to debug.
>
> Thanks,
>   Dave

Actually, upon looking at a sample ELF-format dumpfile from Ken'ichi,
and hacking in a forced-readmem()-failure upon accessing the
page containing the start_kernel __init function, I can reproduce
the problem with the 2.6.19 kernel.  But it is not a problem with
makedumpfile's excluded pages, but rather with the backtrace code's
framesize calculation of schedule() -- which is causing it skip over
the cpu_idle() ending point.  It seems to be specific to 2.6.19 for
some reason.  I'll look further into that issue.

I should note that the reading of the the start_kernel  text is benign,
and only because the error message was added to read_diskdump() in
Ken'ichi's test crash utility is it even noticed.  I've removed that error
message, and created a new one in readmem() specifically for attempts
to read excluded pages.  However, if the readmem() caller has passed
in the QUIET flag to readmem() -- as is the case in the start_kernel
read -- the error message will not be shown unless the debug variable
is set non-zero.  There are a number of place in the crash utility where
readmem()'s are not necessarily expected to work, and so it makes little
sense to always print error messages in read_diskdump().

One other thing I noticed with Ken'ichi's sample dumpfile.  Again,
it's an ELF format file, but there are apparently some pages missing,
because some module pages (.eg, the page of the module that
contains the actual "module" structure) is zero'd out, and so you
get the "WARNING: cannot access vmalloc'd module memory"
during initialization.

For example, here's the modules list_head:

crash> p modules
modules = $2 = {
  next = 0xf8b3f084,
  prev = 0xf8828884
}

The first module in the list would be at 0xf8b3f080 (accounting
for the location of the module.list list_head struct).  But the whole
page is filled with zeros:

crash> rd 0xf8b3f000 1024
f8b3f000:  00000000 00000000 00000000 00000000   ................
f8b3f010:  00000000 00000000 00000000 00000000   ................
f8b3f020:  00000000 00000000 00000000 00000000   ................
f8b3f030:  00000000 00000000 00000000 00000000   ................
f8b3f040:  00000000 00000000 00000000 00000000   ................
f8b3f050:  00000000 00000000 00000000 00000000   ................
...

If I look at the last module on the list at 0xf8828880, it
is contained in the dumpfile:

crash> module 0xf8828880
struct module {
  state = MODULE_STATE_LIVE,
  list = {
    next = 0xc0382a90,
    prev = 0xf882fc04
  },
  name =
"uhci_hcd\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

  mkobj = {
    kobj = {
      k_name = 0xf88288cc "uhci_hcd",
      name = "uhci_hcd\000\000\000\000\000\000\000\000\000\000\000",
...

and if I follow it back, several modules are there, but I bump into
another zero-filled module.  (Thinking back, I can't say for sure whether
it's just bumping into the first one on the list or not...)

Anyway, it's cause for concern that the first one on the list is
either zero'd out, or not written to the dumpfile?  With an ELF
dumpfile, I can't really tell.

Dave