[Crash-utility] Trying to read an 80GB PPC64 crash dump with little luck

Wed Jan 21 15:56:34 UTC 2009

----- "James Washer" <washer at trlp.com> wrote:

> Trying to investigate a SLES10SP1 crash dump
> 
> Using crash 4.0-7.6
> 
> Using -d9, I get the following just before crash gives up. 
> 
> Any ideas?
> 
> thanks
>  - jim
> 
> 
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "powerpc64-unknown-linux-gnu"...
> GETBUF(248 -> 0)
>   GETBUF(1500 -> 1)
> 
>   FREEBUF(1)                                                          
> 
> FREEBUF(0)
> <readmem: c00000000039db08, KVADDR, "kernel_config_data", 32768,(ROE) 11721c00>
> crash: read error: kernel virtual address: c00000000039db08  type: "kernel_config_data"
> WARNING: cannot read kernel_config_data
> GETBUF(248 -> 0)
> FREEBUF(0)
> GETBUF(16 -> 0)
> <readmem: c0000000006470b8, KVADDR, "cpu_possible_map", 16, (ROE), 1071e968>
> crash: read error: kernel virtual address: c0000000006470b8  type: "cpu_possible_map"
> WARNING: cannot read cpu_possible_map
> <readmem: c000000000653130, KVADDR, "cpu_present_map", 16, (ROE), 1071e968>
> crash: read error: kernel virtual address: c000000000653130  type: "cpu_present_map"
> WARNING: cannot read cpu_present_map
> <readmem: c0000000006470c8, KVADDR, "cpu_online_map", 16, (ROE), 1071e968>
> crash: read error: kernel virtual address: c0000000006470c8  type: "cpu_online_map"
> WARNING: cannot read cpu_online_map
> FREEBUF(0)
> <readmem: c0000000006d9f00, KVADDR, "xtime", 16, (FOE), 106f4218>
> crash: read error: kernel virtual address: c0000000006d9f00  type: "xtime"
> op720lpar5:/washer # 

Hi Jim,

Without the prior debug output I can't tell whether the
the dumpfile an ELF kdump or a compressed kdump.  But in 
either case, every read from the dumpfile is failing.

When readmem() is called, it translates those virtual address
values into their physical address, and the physical address is
passed onto either read_kdump() or read_diskdump().  A READ_ERROR
is being returned from whichever of those two functions were called.

(1) read_kdump() passes the request to read_netdump(), which
is where the READ_ERROR is generated.

(2) read_diskdump() -- used for compressed kdump vmcores --
calls diskdump.c's cache_page(), which is where the READ_ERROR
is generated.

In the case of read_kdump()'s call to read_netdump(), a READ_ERROR 
is generated for two possible reasons:

  1. if the physical address does not fit into any of the
     PT_LOAD segments declared in the ELF header.
  2. if the physical address does fit into a PT_LOAD segment,
     but the actual read() system call fails.

In the case of read_diskdump()'s call to cache_page(), a READ_ERROR
is generated for four possible reasons:

  1. if the read() system call of the page descriptor fails.
  2. if the page descriptor's advertised size is greater than the
     header's block_size.
  3. if the read() system call of the compressed page fails.
  4. if the uncompress() of the page data fails.  (but there's
     an error message associated with that, so that's not the
     problem in your case)

So the first thing to determine is where the READ_ERROR is
generated.

Dave