[Crash-utility] crash: invalid kernel virtual address: 0 type: "memory section"

Mon Jan 5 15:49:44 UTC 2015

Just for sanity's sake, try this:

  $ ./crash --minimal ../ddeb/usr/lib/debug/boot/vmlinux-3.13.0-39-generic ../dump.201412280256

and see if you can read the linux_banner string successfully.  For example, using
my sample 3.13 kernel:

  $ crash --minimal 3.13.0-0.rc1.git2.1.fc20_SLAB/vmlinux.gz 3.13.0-0.rc1.git2.1.fc20_SLAB/vmcore_c_d31

  crash 7.0.9
  Copyright (C) 2002-2014  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.

  GNU gdb (GDB) 7.6                                                       
  Copyright (C) 2013 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-unknown-linux-gnu"...

  NOTE: minimal mode commands: log, dis, rd, sym, eval, set, extend and exit

  crash> rd -a linux_banner
  ffffffff818000c0:  Linux version 3.13.0-0.rc1.git2.1.fc20.x86_64 (root at hp-xw455
  ffffffff818000fc:  0-02.ml3.eng.bos.redhat.com) (gcc version 4.8.1 20130814 (Re
  ffffffff81800138:  d Hat 4.8.1-6) (GCC) ) #1 SMP Tue Nov 26 14:42:45 EST 2013
  crash> 

And then try reading other stuff, most notably the __per_cpu_offset[] array,
like this:

  crash> rd __per_cpu_offset 256

Dave

----- Original Message -----
> 
> 
> ----- Original Message -----
> > Hello,
> > 
> > I have a couple dumps generated on Ubuntu Trusty LTS (3.13.0-39-generic
> > kernel) which crash fails on.
> > 
> > $ ./crash ../ddeb/usr/lib/debug/boot/vmlinux-3.13.0-39-generic
> > ../dump.201412280256
> > 
> > crash 7.0.9
> > Copyright (C) 2002-2014  Red Hat, Inc.
> > Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> > Copyright (C) 1999-2006  Hewlett-Packard Co
> > Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> > Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> > Copyright (C) 2005, 2011  NEC Corporation
> > Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> > Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> > This program is free software, covered by the GNU General Public License,
> > and you are welcome to change it and/or distribute copies of it under
> > certain conditions.  Enter "help copying" to see the conditions.
> > This program has absolutely no warranty.  Enter "help warranty" for
> > details.
> > 
> > GNU gdb (GDB) 7.6
> > Copyright (C) 2013 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64-unknown-linux-gnu"...
> > 
> > crash: cannot determine thread return address
> > please wait... (gathering kmem slab cache data)
> > crash: invalid kernel virtual address: 1c  type: "kmem_cache
> > objsize/object_size"
> > crash: failed to read pageflag_names entry
> > please wait... (gathering module symbol data)
> > WARNING: invalid kernel module size: 0
> > 
> > crash: cannot determine idle task addresses from init_tasks[] or
> > runqueues[]
> > 
> > crash: cannot resolve "init_task_union"
> > 
> > 
> > vmlinux-3.13.0-39-generic was extracted from Ubuntu ddeb:
> > 
> > $ file ../ddeb/usr/lib/debug/boot/vmlinux-3.13.0-39-generic
> > ../ddeb/usr/lib/debug/boot/vmlinux-3.13.0-39-generic: ELF 64-bit LSB
> > executable, x86-64, version 1 (SYSV), statically linked,
> > BuildID[sha1]=c4fa631d2cc34a0b2628a5de01a04e81a0667555, not stripped
> > 
> > With -d8 I get:
> > 
> > ...
> > <read_diskdump: addr: ffffffffffffffff paddr: 7fffffff cnt: 1>
> > read_diskdump: paddr/pfn: 7fffffff/7ffff -> cache physical page: 7ffff000
> > crash: invalid kernel virtual address: 0  type: "memory section"
> > 
> > The entire -d8 output is attached.
> > 
> > Bogus "base kernel version" stands out immediately and I'm pretty sure
> > I've seen "0.0.0" in there a couple times with exactly the same dump.
> > >From a quick look, the base kernel version code in kernel.c is not safe
> > against kt->utsname.release being all zeroes.
> > 
> > Eddy Gonzalo (CC'ed) can probably provide access to the dumps if
> > needed.
> > 
> > Thanks,
> >                 Ilya
> 
> The obvious question is: are you sure that the vmlinux matches the dumpfile?
> 
> I say that because there are so many strange readings from this dumpfile,
> As you noted, yes, this definitely is a mismatch, where the header shows
> 
>                sysname: Linux
>               nodename: chqcephnas01
>                release: 3.13.0-39-generic
>                version: #66~precise1-Ubuntu SMP Wed Oct 29 09:56:49 UTC 2014
>                machine: x86_64
> 
> but this gets read from the dumpfile:
> 
>   <readmem: ffffffff81c15284, KVADDR, "init_uts_ns", 390, (ROE), cfa7bc>
>   <read_diskdump: addr: ffffffff81c15284 paddr: 1c15284 cnt: 390>
>   read_diskdump: paddr/pfn: 1c15284/1c15 -> cache physical page: 1c15000
>   base kernel version: 0.13.0
> 
> And one of the first set of items accessed, are the contents of the cpu mask
> variables:
> 
>   <readmem: ffffffff8180acf0, KVADDR, "cpu_possible_mask", 8, (FOE),
>   7fff5ab8b618>
>   <read_diskdump: addr: ffffffff8180acf0 paddr: 180acf0 cnt: 8>
>   read_diskdump: paddr/pfn: 180acf0/180a -> cache physical page: 180a000
>   <readmem: ffffffff8180ace0, KVADDR, "cpu_present_mask", 8, (FOE),
>   7fff5ab8b618>
>   <read_diskdump: addr: ffffffff8180ace0 paddr: 180ace0 cnt: 8>
>   read_diskdump: paddr/pfn: 180ace0/180a -> physical page is cached: 180a000
>   <readmem: ffffffff8180ace8, KVADDR, "cpu_online_mask", 8, (FOE),
>   7fff5ab8b618>
>   <read_diskdump: addr: ffffffff8180ace8 paddr: 180ace8 cnt: 8>
>   read_diskdump: paddr/pfn: 180ace8/180a -> physical page is cached: 180a000
>   <readmem: ffffffff8180acd8, KVADDR, "cpu_active_mask", 8, (FOE),
>   7fff5ab8b618>
>   <read_diskdump: addr: ffffffff8180acd8 paddr: 180acd8 cnt: 8>
>   read_diskdump: paddr/pfn: 180acd8/180a -> physical page is cached: 180a000
> 
> But they all return NULL pointers.  They should return pointers to bitmasks,
> which then get read, and their contents displayed.  For example, I've got
> a 3.13 kernel dumpfile, where each mask pointer is read, the bitmask it
> points
> gets read, and then the contents are dumped:
> 
>   <readmem: ffffffff8180a870, KVADDR, "cpu_possible_mask", 8, (FOE),
>   7fff5f116f48>
>   <read_diskdump: addr: ffffffff8180a870 paddr: 180a870 cnt: 8>
>   <readmem: ffffffff81d8c780, KVADDR, "possible", 1024, (ROE), f45b80>
>   <read_diskdump: addr: ffffffff81d8c780 paddr: 1d8c780 cnt: 1024>
>   cpu_possible_mask: 0 1 2 3
>   <readmem: ffffffff8180a860, KVADDR, "cpu_present_mask", 8, (FOE),
>   7fff5f116f48>
>   <read_diskdump: addr: ffffffff8180a860 paddr: 180a860 cnt: 8>
>   <readmem: ffffffff81d8bf80, KVADDR, "present", 1024, (ROE), f45b80>
>   <read_diskdump: addr: ffffffff81d8bf80 paddr: 1d8bf80 cnt: 128>
>   <read_diskdump: addr: ffffffff81d8c000 paddr: 1d8c000 cnt: 896>
>   cpu_present_mask: 0 1
>   <readmem: ffffffff8180a868, KVADDR, "cpu_online_mask", 8, (FOE),
>   7fff5f116f48>
>   <read_diskdump: addr: ffffffff8180a868 paddr: 180a868 cnt: 8>
>   <readmem: ffffffff81d8c380, KVADDR, "online", 1024, (ROE), f45b80>
>   <read_diskdump: addr: ffffffff81d8c380 paddr: 1d8c380 cnt: 1024>
>   cpu_online_mask: 0 1
>   <readmem: ffffffff8180a858, KVADDR, "cpu_active_mask", 8, (FOE),
>   7fff5f116f48>
>   <read_diskdump: addr: ffffffff8180a858 paddr: 180a858 cnt: 8>
>   <readmem: ffffffff81d8bb80, KVADDR, "active", 1024, (ROE), f45b80>
>   <read_diskdump: addr: ffffffff81d8bb80 paddr: 1d8bb80 cnt: 1024>
>   cpu_active_mask: 0 1
> 
> Right from the get-go, the __per_cpu_offset array looks like it's
> returning all zeroes, in which case pretty much all is lost and the
> dumpfile is useless.
> 
> That can  be seen with the following readmem failure, which
> should take the kt->__per_cpu_offset[0] value and add it to
> the (per-cpu) symbol value of "cpu_number", which presumably
> is b084 in that kernel, and where kt->__per_cpu_offset[0] is
> apparently zero.  Therefore this readmem() call:
> 
>                 if (!readmem(cpu_sp->value + kt->__per_cpu_offset[i],
>                     KVADDR, &cpunumber, sizeof(int),
>                     "cpu number (per_cpu)", QUIET|RETURN_ON_ERROR))
>                         break;
> 
> generated this failure:
> 
>   <readmem: b084, KVADDR, "cpu number (per_cpu)", 4, (ROE|Q), 7fff5ab9c800>
>   crash: invalid kernel virtual address: b084  type: "cpu number (per_cpu)"
> 
> The kt->__per_cpu_offset[] array would have been set up earlier in
> kernel_init():
> 
>         if (symbol_exists("__per_cpu_offset")) {
>                 if (LKCD_KERNTYPES())
>                         i = get_cpus_possible();
>                 else
>                         i = get_array_length("__per_cpu_offset", NULL, 0);
>                 get_symbol_data("__per_cpu_offset",
>                         sizeof(long)*((i && (i <= NR_CPUS)) ? i : NR_CPUS),
>                         &kt->__per_cpu_offset[0]);
>                 kt->flags |= PER_CPU_OFF;
>         }
> 
> It looks like it read the array OK, where the Ubuntu kernel looks like
> it has 256 cpus configured:
> 
>   <readmem: ffffffff81d130e0, KVADDR, "__per_cpu_offset", 2048, (FOE),
>   cfa968>
>   <read_diskdump: addr: ffffffff81d130e0 paddr: 1d130e0 cnt: 2048>
>   read_diskdump: paddr/pfn: 1d130e0/1d13 -> cache physical page: 1d13000
> 
> But when utilizing the stashed kt->__per_cpu_offset[0] value later on (for
> cpu 0),
> it got a zero offset.
> 
> So it looks like the vmlinux and dumpfile don't match, or perhaps the
> dumpfile
> is suspect.
> 
> It would be interesting to confirm that the kernel being used
> (vmlinux-3.13.0-39-generic)
> runs OK live on the crashing system.
> 
> Dave
> 
> 
> 
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
>