[Crash-utility] System.map problems with my 32-bit systems

Dave Anderson anderson at redhat.com
Thu Aug 29 20:49:14 UTC 2013



----- Original Message -----
> Hello Crash Utility Community,
> 
> I am hoping that someone in the Crash Analysis community can provide some
> assistance with a problem that I am having to analyze vmcore files gathered
> from our 32-bit machines. I am working to add kexec to our systems so that
> we can run the crash utility (version 7.0.1) on our appliances and I am
> having trouble with our 32-bit systems. Fortunately my 64-bit systems are
> working fine so I know that can I make the technology work. I believe that
> the crash analysis tool does not like the System.map file and I am trying to
> get to the root cause of this problem.

If the vmlinux file that you're using matches the vmcore, then please don't
use any -S or System.map argument -- just enter: "crash vmlinux vmcore"

System.map files are only required if the symbol values in the vmlinux
file are different from those in the running kernel.  It doesn't sound
like that's the case in your environment.

Secondly, if the session doesn't start that way, please provide the
debug output generated by entering:

 $ crash -d8 vmlinux vmcore

> My problem originally manifests itself when I try to decode the vmcore file.
> After intentionally creating an oops panic event I upload the vmcore file to
> my build machine and run crash on that system. While the vmcore file is
> generated on an appliance I run crash analysis program on the build system
> that produced the Linux kernel since the appliances are meant to be deployed
> into the field and will not be accessible for running crash analysis events.
> 
> build# crash -S System.map vmlinux vmcore
> crash 7.0.1
> ...
> crash: read error: kernel virtual address: c1363c5c type: "cpu_possible_mask"
> 
> So I then tried to find what this symbol is within the map:
> 
> build# crash --minimal -S System.map vmlinux vmcore
> ...
> crash> sym cpu_possible_mask
> c1363c5c (R) cpu_possible_mask
> crash>

When you were in the "minimal" session, were you able to "rd" the cpu_possible_mask
address?  i.e.

  crash> rd c1363c5c

or what did this show:

  crash> rd linux_banner 10

> From this I can only see that the addresses match up. So I then decided to
> run the crash utility on the appliance itself to see what happens. I copied
> the crash utility to the appliance and the uncompressed kernel image to the
> appliance as well. The appliance boots from a "bzImage" file and the crash
> utility can't use the bzImage file for processing so I needed to manually
> copy the uncompressed kernel image to the box.

Right, crash is only interested in the vmlinux ELF file from which the
bzImage file was generated.
 
> I then run the following commands on our appliance for data gathering
> purposes:
> 
> root at appliance:/var/crash# crash -S /boot/System.map
> vmlinux-2.6.32.24-sf.pentM-37
> ...
> WARNING: cannot read linux_banner string
> crash: /boot/System.map and /dev/mem do not match!
> 
> root at appliance:/var/crash# ls -l /boot/System.map
> lrwxrwxrwx 1 root root 32 Aug 28 22:32 /boot/System.map ->
> System.map-2.6.32.24-sf.pentM-37
> root at appliance:/var/crash#
> 
> root at appliance:/var/crash# cat /proc/version
> Linux version 2.6.32.24sf.pentM-37 (build at ajax) (gcc version 4.7.1 (GCC) ) #1
> PREEMPT Mon Aug 26 22:26:34 UTC 2013
> root at appliance:/var/crash#
>
> So from everything I can see the Linux kernel and the System.map file are in
> version agreement but the crash utility disagrees with me. The crash utility
> is the judge so something is wrong. My goal is to find out how I can get the
> information that is needed to determine the problem.

OK, while running on the appliance itself, again, try running without
the System.map argument.  It will presumably still fail as shown above.
On that appliance, what is the output from these commands:

 $ cat  /proc/kallsyms | grep cpu_possible_mask
 $ nm -Bn /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux | grep cpu_possible_mask
 $ grep cpu_possible_mask /boot/System.map
 
If they are not the same, it is possible you may need to use the "--reloc <size>"
command line argument.  That is required for 32-bit x86 kernels that are configured
as described here:

 http://people.redhat.com/anderson/crash.changelog.html#4_0_4_5

Dave




More information about the Crash-utility mailing list