[Crash-utility] Throw read error on vmcore produced by ARM soc.

Wed Mar 27 14:01:37 UTC 2013

----- Original Message -----
> 2013/3/26 Dave Anderson <anderson at redhat.com>:
> >
> >
> > ----- Original Message -----
> >> Hi, list.
> >>
> >> I use crash-utility to analyse crash dump core from ARM soc. When I
> >> execute command below, I get the error "crash: read error: kernel
> >> virtual address: c0c1e040  type: "first vmap_area va_start"". I also
> >> test it by gdb. It works fine. The Linux kernel's version is
> >> v3.0.8.
> >>
> >> hfli at pc1935:~/work/crash-utility$ ./crash vmlinux Vmcore
> >>
> >> crash 6.1.4
> >> Copyright (C) 2002-2013  Red Hat, Inc.
> >> Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> >> Copyright (C) 1999-2006  Hewlett-Packard Co
> >> Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> >> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> >> Copyright (C) 2005, 2011  NEC Corporation
> >> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> >> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> >> This program is free software, covered by the GNU General Public License,
> >> and you are welcome to change it and/or distribute copies of it under
> >> certain conditions.  Enter "help copying" to see the conditions.
> >> This program has absolutely no warranty.  Enter "help warranty" for
> >> details.
> >>
> >> GNU gdb (GDB) 7.3.1
> >> Copyright (C) 2011 Free Software Foundation, Inc.
> >> License GPLv3+: GNU GPL version 3 or later
> >> <http://gnu.org/licenses/gpl.html>
> >> This is free software: you are free to change and redistribute it.
> >> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> >> and "show warranty" for details.
> >> This GDB was configured as "--host=i686-pc-linux-gnu --target=arm-elf-linux"...
> >>
> >> crash: read error: kernel virtual address: c0c1e040  type: "first vmap_area va_start"
> >>
> >> Errors like the one above typically occur when the kernel and memory source
> >> do not match.  These are the files being used:
> >>
> >>       KERNEL: vmlinux
> >>     DUMPFILE: Vmcore
> >
> > You've answered your own question -- you should always see errors if the vmlinux
> > kernel does not match the kernel crashed system.
> >
> > If you cannot find/access the original vmlinux file that was being run
> > by the crashed kernel, then get the /boot/System.map file of the crashed
> > kernel, and enter it on the command line:
> Thanks for your reply.
> 
> The vmlinux, include debug information, and crash kernel, is
> cross-compile built and produced together. I couldn't understand why
> crash throw this warning "kernel and source doesn't match".
> 
> >
> >  $ crash vmlinux Vmcore System.map
> >
> > The crash utility will replace all of the invalid symbol values from the
> > "wrong" vmlinux file with their correct values from the System.map file.
> 
> 
> A moment ago. I rebuilt the arm kernel source again. And took "echo c
> > /proc/sysrq-trigger" command to trigger system panic. The status lists below.
> hfli at pc1935:~/work/crash-utility$ ./crash vmlinux0327 Vmcore0327
> 
> crash 6.1.4
> Copyright (C) 2002-2013  Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> Copyright (C) 1999-2006  Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011  NEC Corporation
> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions.  Enter "help copying" to see the conditions.
> This program has absolutely no warranty.  Enter "help warranty" for
> details.
> 
> GNU gdb (GDB) 7.3.1
> Copyright (C) 2011 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "--host=i686-pc-linux-gnu --target=arm-elf-linux"...
> 
> please wait... (gathering kmem slab cache data)
> crash: read error: kernel virtual address: c0c91840  type: "kmem_cache buffer"
> 
> crash: unable to initialize kmem slab cache subsystem
> 
> 
> WARNING: invalid note (n_type != NT_PRSTATUS)
> 
> WARNING: could not retrieve crash_notes
> please wait... (gathering task table data)
> crash: cannot read pid_hash upid
> 
> crash: cannot read pid_hash upid
> please wait... (determining panic task)
> WARNING: cannot get stackframe for task
>       KERNEL: vmlinux0327
>     DUMPFILE: Vmcore0327
>         CPUS: 1
>         DATE: Thu Jan  1 08:00:00 1970
>       UPTIME: 00:00:00
> LOAD AVERAGE: 0.00, 0.00, 0.00
>        TASKS: 1
>     NODENAME: 10.38.50.241
>      RELEASE: 3.0.8-00010-gb7f16a3-dirty
>      VERSION: #339 Wed Mar 27 10:39:43 CST 2013
>      MACHINE: armv7l  (unknown Mhz)
>       MEMORY: 19 MB
>        PANIC: ""
>          PID: 0
>      COMMAND: "swapper"
>         TASK: c02e0620  [THREAD_INFO: c02dc000]
>          CPU: 0
>        STATE: TASK_RUNNING (ACTIVE)
>      WARNING: panic task not found
> 
> crash>
> 
> 
> It also didn't works so fine. Then I appended system.map, the output
> result is also the same.

OK, so then it's not clear to me why you're seeing those errors.

Was the dumpfile created using kdump?  It almost looks like the dump
was taken while the system was still running?  Have you *ever* created
a dumpfile that resulted in an error-free crash session? 

Perhaps the ARM users on this list have seen this kind of thing? 

If you enter "crash -d8 ..." on the command line, you may get a better
picture of what leads up to the errors shown above, and of most
interest, the readmem() calls that generate the errors.  If you
see a "crash: read error: ...", then that means that the dumpfile
doesn't contain the physical page associated with the virtual
address shown.  But it's not clear whether the address itself
is legitimate, i.e., was it gathered from the wrong location.

> 
> I try GDB to test it.
> hfli at pc1935:~/work/crash-utility$ ./gdb-7.5/gdb/gdb vmlinux0327
> Vmcore0327
> GNU gdb (GDB) 7.5
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show
> copying"
> and "show warranty" for details.
> This GDB was configured as "--host=x86 --target=arm-linux-gnueabi".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from
> /home/hfli/work/crash-utility/vmlinux0327...done.
> 
> warning: exec file is newer than core file.

Again, this bothers me -- why is it "newer" than the core file?
Are you sure that they are *exactly* the same?

> [New LWP 278]
> #0  0xc0155f7c in sysrq_handle_crash (key=99) at
> drivers/tty/sysrq.c:134
> 134             *killer = 1;
> (gdb) list
> 129     {
> 130             char *killer = NULL;
> 131
> 132             panic_on_oops = 1;      /* force panic */
> 133             wmb();
> 134             *killer = 1;
> 135     }
> 136     static struct sysrq_key_op sysrq_crash_op = {
> 137             .handler        = sysrq_handle_crash,
> 138             .help_msg       = "Crash",
> (gdb)
> 
> gdb also works fine.
> 

It works fine for gdb in the very limited case above.  The crash utility
is also "working fine" for a much more expansive access of the dumpfile.
But if you tried to access the same locations in the dumpfile that the 
crash utility is doing during its initialization, then gdb would also
fail.

Let's take a simple example -- in your first email, you saw this error:

 crash: read error: kernel virtual address: c0c1e040  type: "first vmap_area va_start"

which came from here:

        if (vt->flags & USE_VMAP_AREA) {
                get_symbol_data("vmap_area_list", sizeof(void *), &vmap_area);
                if (!vmap_area)
                        return 0;
                if (!readmem(vmap_area - OFFSET(vmap_area_list) +
                    OFFSET(vmap_area_va_start), KVADDR, &vmalloc_start,
                    sizeof(void *), "first vmap_area va_start", RETURN_ON_ERROR))
                        non_matching_kernel();

If I look at a sample ARM dumpfile I have, I see this:

 crash> p vmap_area_list
 vmap_area_list = $8 = {
   next = 0xc30d4d78, 
   prev = 0xc06702b8
 }

where the "next" pointer of 0xc30d4d78 above points to the "list" member
of a vmap_area structure:

 crash> struct vmap_area
 struct vmap_area {
     long unsigned int va_start;
     long unsigned int va_end;
     long unsigned int flags;
     struct rb_node rb_node;
     struct list_head list;         <== "next" points here
     struct list_head purge_list;
     void *private;
     struct rcu_head rcu_head;
 }
 SIZE: 52
 crash>

And I can dump that vmap_area structure like this:

 crash> struct -x vmap_area -l vmap_area.list 0xc30d4d78
 struct vmap_area {
   va_start = 0xbf000000, 
   va_end = 0xbf005000, 
   flags = 0x4, 
   rb_node = {
     rb_parent_color = 0xc2ca076d, 
     rb_right = 0x0, 
     rb_left = 0x0
   }, 
   list = {
     next = 0xc2ca0778, 
     prev = 0xc0411ed4
   }, 
   purge_list = {
     next = 0x0, 
     prev = 0x0
   }, 
   private = 0xc3396860, 
   rcu_head = {
     next = 0x0, 
     func = 0
   }
 }

But your kernel found a "vmap_area_list.next" pointer of c0c1e040, 
but it was not accessible from the dumpfile.

So either:

 (1) the "vmap_area_list" symbol value was not correct, or
 (2) the page containing the first vmap_area structure was
     not included in the dumpfile.

Problem (1) can happen if your crashed kernel doesn't match the
vmlinux file, i.e., the symbol values don't match.  But if the
"vmap_area_list" symbol was correct, then (2) mush have occurred,
and that should never happen unless the dumpfile was corrupted or
was created incorrectly.

Dave