[Crash-utility] determining a "valid" vmcore
Andrew Hecox
ahecox at redhat.com
Thu Feb 7 20:23:14 UTC 2008
On Thu, 2008-02-07 at 14:40 -0500, Dave Anderson wrote:
> Andrew Hecox wrote:
> > On Thu, 2008-02-07 at 11:27 -0500, Dave Anderson wrote:
> >> Andrew Hecox wrote:
> >>> On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
> >>>> Andrew Hecox wrote:
> >>>>> hello,
> >>>>>
> >>>>> I'm looking at a customer issue where diskdumpmsg is unable to read a
> >>>>> vmcore file. It is not clear if this a problem with the vmcore file or
> >>>>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
> >>>>> it, can see no problems. However, I'm new to the tool so that doesn't
> >>>>> give me a lot of confidence.
> >>>>>
> >>>>> Does anyone have any suggestions on how or if I can use crash to help
> >>>>> determine if there's corruption in the vmcore file? Or any other way of
> >>>>> approaching the problem?
> >>>>>
> >>>>> Thanks much,
> >>>>>
> >>>>> Andrew
> >>>>>
> >>>> I'm not sure what you expect the crash utility to do -- if it comes
> >>>> up to a prompt with no error or warning messages, it means that the
> >>>> ELF header contains what appears to be valid usable information,
> >>>> and that the minimum kernel memory contents required to set up the
> >>>> crash utility's notion of the running system are all in place. That's
> >>>> not to say that there is no chance that the vmcore contains some
> >>>> corruption that was not recognized.
> >>>>
> >>> Thanks. Any other suggestions on how to determine if a vmcore is "valid"
> >>> or is that not even a reasonable question to try and ask? The problem
> >>> I'm trying to solve is described better below:
> >>>
> >>>> With respect to diskdumpmsg, as I understand it, it was fairly recently
> >>>> changed from a perl script to a C file so that it could be run
> >>>> earlier in time so as to be able to use the swap partition. Looking
> >>>> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
> >>>> error types and associated error messages. What do you mean when you
> >>>> say that "diskdumpmsg is unable to read a vmcore file"?
> >>> Specifically:
> >>>
> >>> - user reported a floating point exception from diskdump on startup
> >>> - the result was reproducible locally but only with their vmcore file
> >>> - fpe occurred in get_logbuf:
> >>> log_end %= log_buf_len;
> >>> - log_buf_len had been set to 0 in read_buffer
> >>> if (!page_is_dumpable(pfn, dump->device)) {
> >>> memset(buf, 0, copy_len);
> >>> } else {
> >>> - I don't know enough to say if the page really wasn't dumpable.
> >>> static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
> >>> {
> >>> return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
> >>> }
> >>> - I wrote a patch with one way to avoid the FPE (attached) and sent it
> >>> to SEG.
> >>>
> >>> Now I'm trying to determine if the vmcore file should be readable by
> >>> diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
> >>> or a problem with the vmcore file prior to it getting to diskdumpmsg.
> >>> Unfortunately, I don't understand the problem domain very well at all,
> >>> hence the probably naive questions :)
> >>>
> >>> Any suggestions are appreciated.
> >>>
> >>> -Andrew
> >> So it appears that the page containing the log_buf_len symbol is not
> >> readable or contained in the dumpfile. BTW, is this a compressed
> >> dumpfile or an ELF formatted dumpfile? And what "dump_level" did
> >> they configure?
> >>
> >
> > compressed, level is 19.
> >
> >> Anyway, back to the log_buf_len symbol read, what happens when you
> >> enter the "log" command while in a crash session? It attempts to
> >> read that symbol immediately.
> >>
> >
> > I get what appears to be a full and valid dump of the kernel message
> > buffer.
> >
>
> The crash utility has the same page_is_dumpable() function, which I presume
> looks at precisely the same bitmap data from the dumpfile. And that
> must be working, given that the "log" command works as expected.
>
> One difference is that diskdumpmsg uses /boot/System.map-<release> for
> the symbol values, whereas crash uses the vmlinux file. It might be
> of interest to determine whether the value of "log_buf_len" used by
> diskdumpmsg is the same symbol value as used by crash.
>
I get the same:
(/boot/System.map-2.6.9-67.0.1.ELhugemem)
02323bd8 d log_buf_len
(/usr/lib/debug/lib/modules/2.6.9-67.0.1.ELhugemem/vmlinux)
$1 = (int *) 0x2323bd8
-Andrew
> Dave
>
>
More information about the Crash-utility
mailing list