[Crash-utility] ARM64 (odroid-c2) crash fails to read live kernel

Dave Anderson anderson at redhat.com
Tue Mar 8 19:38:28 UTC 2016



----- Original Message -----
> I have the new Odroid-C2 arm64 cortex-a53 board and have been trying to get
> crash to work against the live kernel.
> 
> I think the key error is this:
> linux_banner:
> crash: /lib/modules/3.14.29+/build/vmlinux and /dev/mem do not match!
> 
> They should match as I built the kernel myself and verified the vmlinux in
> /lib/modules is the one
> I'm booted on. What concerns me is that it does not appear to be able to read
> anything
> from the vmlinux file:
> <read_dev_mem: addr: ffffffc001c0dbac paddr: 2c0dbac cnt: 390>
> utsname:
> sysname: (not printable)
> nodename:
> release: J
> version: (not printable)
> machine: r
> domainname:
> base kernel version: 0.1.4

It's not reading the utsname data from the vmlinux file, but from /dev/mem.
And it's the reads from /dev/mem that are returning nonsense data.

The readmem() calls in your debug output are are all from unity-mapped 
virtual addresses, which get translated to their physical address 
equivalents, which are passed to /dev/mem.

And in your output, all of the data returned from /dev/mem is obviously 
bogus, so my best guess is that there is a fundamental problem
with the manner in which unity-mapped addresses get translated to 
the physical addresses passed to /dev/mem.  (as opposed to a problem
with the /dev/mem driver itself)

Unfortunately I don't have an ARM64 system where I can use /dev/mem,
because all RHEL kernels are configured with CONFIG_STRICT_DEVMEM. 
So we use the /dev/crash "misc" driver that is built into RHEL
kernels. 

I doubt it's an issue with /dev/mem itself, but for sanity's sake, 
what happens if you enter "crash /proc/kcore"?  It will use /proc/kcore
instead of /dev/mem for accessing kernel memory.

Anyway, the arm64 VTOP() macro used to translate virtual-to-physical
addresses in crash looks like this:

#define VTOP(X) \
        ((unsigned long)(X)-(machdep->machspec->page_offset)+(machdep->machspec->phys_offset))

You can watch the translation happen by running "crash --minimal" on your
system like this: 
  
  # crash --minimal
  
  crash 7.1.4
  Copyright (C) 2002-2014  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
   
  GNU gdb (GDB) 7.6
  Copyright (C) 2013 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "aarch64-unknown-linux-gnu"...
  
  NOTE: minimal mode commands: log, dis, rd, sym, eval, set, extend and exit
  
  crash> 
  
An easy manner of determining that at least unity-mapped addresses get
translated correctly is to read the kernel's "linux_banner" string:
  
  crash> rd linux_banner 30
  fffffe00007800b8:  65762078756e694c 2e34206e6f697372   Linux version 4.
  fffffe00007800c8:  63722e302d302e34 376c652e30322e33   4.0-0.rc3.20.el7
  fffffe00007800d8:  343668637261612e 75626b636f6d2820   .aarch64 (mockbu
  fffffe00007800e8:  366d726140646c69 75622e3630302d34   ild at arm64-006.bu
  fffffe00007800f8:  2e676e652e646c69 686465722e736f62   ild.eng.bos.redh
  fffffe0000780108:  20296d6f632e7461 7265762063636728   at.com) (gcc ver
  fffffe0000780118:  382e34206e6f6973 303531303220352e   sion 4.8.5 20150
  fffffe0000780128:  6465522820333236 382e342074614820   623 (Red Hat 4.8
  fffffe0000780138:  47282029342d352e 3123202920294343   .5-4) (GCC) ) #1
  fffffe0000780148:  64655720504d5320 3120322063654420    SMP Wed Dec 2 1
  fffffe0000780158:  2038353a37353a34 3531303220545345   4:57:58 EST 2015
  fffffe0000780168:  000000000000000a 745f6b6361706e75   ........unpack_t
  fffffe0000780178:  7366746f6f725f6f 0000000000000000   o_rootfs........
  fffffe0000780188:  0000000000000000 0000000000000000   ................
  fffffe0000780198:  0000000000000000 0000000000000000   ................
  crash> 
  
If you do that on your system, I'm guessing that there is garbage in the 
rightmost ASCII translation column.

Anyway, let's just take the first 64-bit word, and show the VTOP() in action:

  crash> set debug 4
  debug: 4
  crash> rd linux_banner
  <addr: fffffe00007800b8 count: 1 flag: 490 (KVADDR)>
  <readmem: fffffe00007800b8, KVADDR, "64-bit KVADDR", 8, (FOE), 3ffc4c05fb8>
  <read_memory_device: addr: fffffe00007800b8 paddr: 40007800b8 cnt: 8>
  fffffe00007800b8:  65762078756e694c                    Linux ve
  crash> 

The VTOP() values used can be found like this:

  crash> help -m | grep -e page_offset -e phys_offset
             page_offset: fffffe0000000000
             phys_offset: 4000000000
  crash>

Your output will be different, because your page_offset is based upon a
VA_BITS value of 39 instead of my 42.  So yours should show ffffffc000000000
as the page_offset, and 1000000 as the phys_offset (also shown in your debug log). 

So for any unity-mapped virtual address, you would subtract the page_offset
value, and then add the phys_offset.  In my example above, reading linux_banner
at fffffe00007800b8 does this:

  fffffe00007800b8 - fffffe0000000000 + 0x4000000000 = 0x40007800b8

where you can see the translated "paddr" physical address in this line of 
the debug output above:

   <read_memory_device: addr: fffffe00007800b8 paddr: 40007800b8 cnt: 8>

and which I can use as alternative argument:

  crash> rd -p 40007800b8 30
        40007800b8:  65762078756e694c 2e34206e6f697372   Linux version 4.
        40007800c8:  63722e302d302e34 376c652e30322e33   4.0-0.rc3.20.el7
        40007800d8:  343668637261612e 75626b636f6d2820   .aarch64 (mockbu
        40007800e8:  366d726140646c69 75622e3630302d34   ild at arm64-006.bu
        40007800f8:  2e676e652e646c69 686465722e736f62   ild.eng.bos.redh
        4000780108:  20296d6f632e7461 7265762063636728   at.com) (gcc ver
        4000780118:  382e34206e6f6973 303531303220352e   sion 4.8.5 20150
        4000780128:  6465522820333236 382e342074614820   623 (Red Hat 4.8
        4000780138:  47282029342d352e 3123202920294343   .5-4) (GCC) ) #1
        4000780148:  64655720504d5320 3120322063654420    SMP Wed Dec 2 1
        4000780158:  2038353a37353a34 3531303220545345   4:57:58 EST 2015
        4000780168:  000000000000000a 745f6b6361706e75   ........unpack_t
        4000780178:  7366746f6f725f6f 0000000000000000   o_rootfs........
        4000780188:  0000000000000000 0000000000000000   ................
        4000780198:  0000000000000000 0000000000000000   ................
  crash> 

In your debug log, taking the "init_uts_ns" read, it takes the ffffffc001c0dbac,
subtracts the page_offset of ffffffc000000000, and adds the phys_offset of
0x1000000, resulting in "paddr" of 2c0dbac:

  <readmem: ffffffc001c0dbac, KVADDR, "init_uts_ns", 390, (ROE), b9606c>
  <read_dev_mem: addr: ffffffc001c0dbac paddr: 2c0dbac cnt: 390>

But it's getting back garbage...

I don't know why it's failing to find legitimate data at that location.  
The page_offset calculation of ffffffc000000000 and the phys_offset value 
are based upon the symbol values themselves, and the phys_offset value 
as found in /proc/iomem.  (See the definition of ARM64_PAGE_OFFSET in defs.h,
and the arm64_calc_VA_BITS() function in arm64.c).  Are there more than one 
"System RAM" sections in your /proc/iomem?

Dave



> 
> If I elfdump or objdump the vmlinux and grep banner I can see the symbol:
> root at odroid64-pre:~/linux# readelf --syms vmlinux | grep banner
> 74463: ffffffc00186a090 149 OBJECT GLOBAL DEFAULT 4 linux_banner
> 75496: ffffffc00186a028 100 OBJECT GLOBAL DEFAULT 4 linux_proc_banner
> 
> root at odroid64-pre:~/linux# eu-nm -a vmlinux | grep banner
> linux_banner |ffffffc00186a090|GLOBAL|OBJECT |0000000000000095|
> version.c:43|.rodata
> linux_proc_banner |ffffffc00186a028|GLOBAL|OBJECT |0000000000000064|
> version.c:47|.rodata
> 
> I pulled the crash source and built it native on the arm64 box.
> If I could get a pointer on where to start with debugging this it would help
> (i.e. which error to focus on first)
> 
> ===
> The full dump of crash startup is below:
> root at odroid64-pre:~/linux# /root/crash-7.1.4/crash -d 64
> 
> crash 7.1.4
> Copyright (C) 2002-2015 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for details.
> 
> 
> find_booted_kernel: search for [Linux version 3.14.29+ (root at odroid64-pre)
> (gcc version 5.3.1 20160225 (Ubuntu/Linaro 5.3.1-10ubuntu2) ) #1 SMP PREEMPT
> Tue Mar 8 01:06:35 CST 2016]
> searchdirs[8]: /usr/lib/debug/lib/modules/3.14.29+/
> searchdirs[0]: /usr/src/linux/
> searchdirs[1]: /boot/
> searchdirs[2]: /boot/efi/redhat
> searchdirs[3]: /boot/efi/EFI/redhat
> searchdirs[4]: /
> searchdirs[5]: /lib/modules/3.14.29+/build/
> searchdirs[6]: /usr/src/redhat/BUILD/kernel-3.14.29/linux/
> searchdirs[7]: /usr/src/redhat/BUILD/kernel-3.14.29/linux-3.14.29/
> mount_points[0]: / (c46630)
> mount_points[1]: /sys (c46650)
> mount_points[2]: /proc (c46670)
> mount_points[3]: /dev (c46690)
> mount_points[4]: /dev/pts (c466b0)
> mount_points[5]: /run (c466d0)
> mount_points[6]: / (c466f0)
> mount_points[7]: /sys/kernel/security (c46710)
> mount_points[8]: /dev/shm (c46740)
> mount_points[9]: /run/lock (c46760)
> mount_points[10]: /sys/fs/cgroup (c46780)
> mount_points[11]: /sys/fs/cgroup/systemd (c467b0)
> mount_points[12]: /sys/fs/cgroup/devices (c467f0)
> mount_points[13]: /sys/fs/cgroup/cpuset (c46830)
> mount_points[14]: /sys/fs/cgroup/cpu,cpuacct (c46870)
> mount_points[15]: /sys/fs/cgroup/blkio (c468b0)
> mount_points[16]: /sys/fs/cgroup/debug (c468e0)
> mount_points[17]: /sys/fs/cgroup/perf_event (c46910)
> mount_points[18]: /sys/fs/cgroup/freezer (c46950)
> mount_points[19]: /sys/fs/cgroup/net_cls (c46990)
> mount_points[20]: /proc/sys/fs/binfmt_misc (c469d0)
> mount_points[21]: /dev/mqueue (c46a10)
> mount_points[22]: /sys/kernel/debug (c46a30)
> mount_points[23]: /dev/hugepages (c46a60)
> mount_points[24]: /run/rpc_pipefs (c46a90)
> mount_points[25]: /sys/kernel/config (c46ac0)
> mount_points[26]: /media/boot (c46af0)
> mount_points[27]: /run/cgmanager/fs (c46b10)
> mount_points[28]: /run/user/118 (c46b40)
> mount_points[29]: /run/user/118/gvfs (c46b70)
> mount_points[30]: /sys/fs/fuse/connections (c46ba0)
> mount_points[31]: /run/user/0 (c46be0)
> find_booted_kernel: check: /lib/modules/3.14.29+/build/vmlinux
> find_booted_kernel: found: /lib/modules/3.14.29+/build/vmlinux
> get_live_memory_source: /dev/mem
> /proc/version:
> Linux version 3.14.29+ (root at odroid64-pre) (gcc version 5.3.1 20160225
> (Ubuntu/Linaro 5.3.1-10ubuntu2) ) #1 SMP PREEMPT Tue Mar 8 01:06:35 CST 2016
> /lib/modules/3.14.29+/build/vmlinux:
> Linux version 3.14.29+ (root at odroid64-pre) (gcc version 5.3.1 20160225
> (Ubuntu/Linaro 5.3.1-10ubuntu2) ) #1 SMP PREEMPT Tue Mar 8 01:06:35 CST 2016
> readmem: read_dev_mem() -> /dev/mem
> VA_BITS: 39
> using 1000000 as phys_offset
> gdb /lib/modules/3.14.29+/build/vmlinux
> GNU gdb (GDB) 7.6
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later < http://gnu.org/licenses/gpl.html
> >
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "aarch64-unknown-linux-gnu"...
> GETBUF(248 -> 0)
> GETBUF(1500 -> 1)
> 
> FREEBUF(1)
> FREEBUF(0)
> <readmem: ffffffc001874510, KVADDR, "kernel_config_data", 32768, (ROE),
> 17c47c0>
> <read_dev_mem: addr: ffffffc001874510 paddr: 2874510 cnt: 2800>
> <read_dev_mem: addr: ffffffc001875000 paddr: 2875000 cnt: 4096>
> <read_dev_mem: addr: ffffffc001876000 paddr: 2876000 cnt: 4096>
> <read_dev_mem: addr: ffffffc001877000 paddr: 2877000 cnt: 4096>
> <read_dev_mem: addr: ffffffc001878000 paddr: 2878000 cnt: 4096>
> <read_dev_mem: addr: ffffffc001879000 paddr: 2879000 cnt: 4096>
> <read_dev_mem: addr: ffffffc00187a000 paddr: 287a000 cnt: 4096>
> <read_dev_mem: addr: ffffffc00187b000 paddr: 287b000 cnt: 4096>
> <read_dev_mem: addr: ffffffc00187c000 paddr: 287c000 cnt: 1296>
> WARNING: could not find MAGIC_START!
> GETBUF(248 -> 0)
> FREEBUF(0)
> GETBUF(8 -> 0)
> <readmem: ffffffc00186fd80, KVADDR, "cpu_possible_mask", 8, (FOE),
> 7ffe1dfbd0>
> <read_dev_mem: addr: ffffffc00186fd80 paddr: 286fd80 cnt: 8>
> <readmem: 1600000102, KVADDR, "possible", 8, (ROE), bf8ae8>
> crash: invalid kernel virtual address: 1600000102 type: "possible"
> WARNING: cannot read cpu_possible_map
> <readmem: ffffffc00186fd70, KVADDR, "cpu_present_mask", 8, (FOE), 7ffe1dfbd0>
> <read_dev_mem: addr: ffffffc00186fd70 paddr: 286fd70 cnt: 8>
> <readmem: 189a, KVADDR, "present", 8, (ROE), bf8ae8>
> crash: invalid kernel virtual address: 189a type: "present"
> WARNING: cannot read cpu_present_map
> <readmem: ffffffc00186fd78, KVADDR, "cpu_online_mask", 8, (FOE), 7ffe1dfbd0>
> <read_dev_mem: addr: ffffffc00186fd78 paddr: 286fd78 cnt: 8>
> <readmem: 13a48, KVADDR, "online", 8, (ROE), bf8ae8>
> crash: invalid kernel virtual address: 13a48 type: "online"
> WARNING: cannot read cpu_online_map
> <readmem: ffffffc00186fd68, KVADDR, "cpu_active_mask", 8, (FOE), 7ffe1dfbd0>
> <read_dev_mem: addr: ffffffc00186fd68 paddr: 286fd68 cnt: 8>
> <readmem: 1600000102, KVADDR, "active", 8, (ROE), bf8ae8>
> crash: invalid kernel virtual address: 1600000102 type: "active"
> WARNING: cannot read cpu_active_map
> FREEBUF(0)
> GETBUF(248 -> 0)
> FREEBUF(0)
> GETBUF(248 -> 0)
> FREEBUF(0)
> <readmem: ffffffc001d61238, KVADDR, "timekeeper xtime_sec", 8, (ROE),
> 7ffe1dfc98>
> <read_dev_mem: addr: ffffffc001d61238 paddr: 2d61238 cnt: 8>
> xtime timespec.tv_sec: 5f044158000e9068: (null)
> <readmem: ffffffc001c0dbac, KVADDR, "init_uts_ns", 390, (ROE), b9606c>
> <read_dev_mem: addr: ffffffc001c0dbac paddr: 2c0dbac cnt: 390>
> utsname:
> sysname: (not printable)
> nodename:
> release: J
> version: (not printable)
> machine: r
> domainname:
> base kernel version: 0.1.4
> <readmem: ffffffc00186a090, KVADDR, "accessible check", 8, (ROE|Q),
> 7ffe1df350>
> <read_dev_mem: addr: ffffffc00186a090 paddr: 286a090 cnt: 8>
> <readmem: ffffffc00186a090, KVADDR, "read_string characters", 1499, (ROE|Q),
> 7ffe1df6c8>
> <read_dev_mem: addr: ffffffc00186a090 paddr: 286a090 cnt: 1499>
> /proc/version:
> Linux version 3.14.29+ (root at odroid64-pre) (gcc version 5.3.1 20160225
> (Ubuntu/Linaro 5.3.1-10ubuntu2) ) #1 SMP PREEMPT Tue Mar 8 01:06:35 CST 2016
> linux_banner:
> 
> crash: /lib/modules/3.14.29+/build/vmlinux and /dev/mem do not match!
> 
> Usage:
> 
> crash [OPTION]... NAMELIST MEMORY-IMAGE[@ADDRESS] (dumpfile form)
> crash [OPTION]... [NAMELIST] (live system form)
> 
> Enter "crash -h" for details.
> 
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility




More information about the Crash-utility mailing list