[Crash-utility] using crash for ARM

Tue Aug 14 08:51:56 UTC 2012

strings ./vmlinux | grep "Linux version"
Linux version 3.0.15+ (oza at lc-blr-291) (gcc version 4.4.3 (GCC) ) #4 PREEMPT Mon Aug 13 12:02:58 IST 2012

cat /proc/version
Linux version 3.0.15+ (oza at lc-blr-291) (gcc version 4.4.3 (GCC) ) #4 PREEMPT Mon Aug 13 12:02:58 IST 2012

crash> ptype struct kmem_cache
type = struct kmem_cache {
    struct array_cache *array[1];
    unsigned int batchcount;
    unsigned int limit;
    unsigned int shared;
    unsigned int buffer_size;
    u32 reciprocal_buffer_size;
    unsigned int flags;
    unsigned int num;
    unsigned int gfporder;
    gfp_t gfpflags;
    size_t colour;
    unsigned int colour_off;
    struct kmem_cache *slabp_cache;
    unsigned int slab_size;
    unsigned int dflags;
    void (*ctor)(void *);
    const char *name;
    struct list_head next;
    struct kmem_list3 *nodelists[1];
}

Regards,
Oza.

________________________________
 From: Dave Anderson <anderson at redhat.com>
To: paawan oza <paawan1982 at yahoo.com> 
Cc: "Discussion list for crash utility usage, maintenance and development" <crash-utility at redhat.com> 
Sent: Monday, 13 August 2012 11:25 PM
Subject: Re: [Crash-utility] using crash for ARM

----- Original Message -----
> 
> 
> Hi,
> 
> 
> I just thought it might be file creation issue and permission issue.
> with change as following: I am able to get to crash prompt.
> 
> 
> void
> open_tmpfile2(void)
> {
> if (pc->tmpfile2)
> error(FATAL, "recursive secondary temporary file usage\n");
> 
> // pc->flags |= DROP_CORE;
> // if ((pc->tmpfile2 = tmpfile()) == NULL)
> // error(FATAL, "cannot open secondary temporary file\n");
> 
> pc->tmpfile2 = fopen ("/system/tmp", "w" );
> if (pc->tmpfile2 == NULL) {
> printf("error no is %d",errno);
> perror("tmpfile");
> 
> error(FATAL, "cannot open secondary temporary file\n");
> }
> // pc->flags &= ~DROP_CORE;
> rewind(pc->tmpfile2);
> }
> 
> 
> 
> /system is mounted as read-write filesystem, I thought read-only
> file-system might be the issue.

You will need to do the same kind of thing for the open_tmpfile() function
as well.

I thought that setting a TMPDIR environment variable might allow you to use
an alternate directory from "/tmp", but that doesn't seem to apply to tmpfile()
usage.

> 
> 
> still I get some warnings
> 
> 
> shell at android:/system # ./crash ./vmlinux ./System.map
> 
> crash 6.0.8
> Copyright (C) 2002-2012 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
> details.
> 
> GNU gdb (GDB)jj 7.3.1
> Copyright (C) 2011 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "arm-none-linux-gnueabi"...
> 
> WARNING: kernels compiled by different gcc versions:
> ./vmlinux: (unknown)
> live system kernel: 4.4.3

I asked this before, but did you ever show the output of:

  $ strings ./vmlinux | grep "Linux version"

It should match the output of /proc/version w/respect to the
gcc version embedded in the linux_banner string that was 
used to compile the kernel.

> crash: invalid size request: 0 type: "array cache array"
> crash: unable to initialize kmem slab cache subsystem

For some reason, the call in vm_init() here is failing to initialize
the array length of the kmem_cache.array[] array:

                        }
                        MEMBER_OFFSET_INIT(kmem_cache_s_array, "kmem_cache", "array");
   ====>                ARRAY_LENGTH_INIT(len, NULL, "kmem_cache.array", NULL, 0);
                }
                MEMBER_OFFSET_INIT(slab_list, "slab", "list");
                MEMBER_OFFSET_INIT(slab_s_mem, "slab", "s_mem");
                MEMBER_OFFSET_INIT(slab_inuse, "slab", "inuse");

The ARRAY_LENGTH_INIT() macro ends up calling get_array_length(),
passing "kmem_cache.array" as the first argument.  It should end up
using the open_tmpfile2() function to parse the output of the gdb
command "ptype struct kmem_cache".

After you get a crash> prompt, what do you see when you enter this?:

  crash> ptype struct kmem_cache

For example, I see this on a RHEL5 kernel:

  crash> ptype struct kmem_cache
  type = struct kmem_cache {
      struct array_cache *array[255];
      unsigned int batchcount;
      unsigned int limit;
      unsigned int shared;
      unsigned int buffer_size;
      struct kmem_list3 *nodelists[64];
      unsigned int flags;
      unsigned int num;
      unsigned int gfporder;
      gfp_t gfpflags;
      size_t colour;
      unsigned int colour_off;
      struct kmem_cache *slabp_cache;
      unsigned int slab_size;
      unsigned int dflags;
      void (*ctor)(void *, struct kmem_cache *, long unsigned int);
      void (*dtor)(void *, struct kmem_cache *, long unsigned int);
      const char *name;
      struct list_head next;
  }
  crash>

Dave

> 
> SYSTEM MAP: ./System.map
> DEBUG KERNEL: ./vmlinux
> DUMPFILE: /dev/mem
> CPUS: 1
> DATE: Sat Jan 1 04:19:18 2000
> UPTIME: 01:02:54
> LOAD AVERAGE: 1.28, 0.98, 0.76
> TASKS: 505
> NODENAME: localhost
> RELEASE: 3.0.15+
> VERSION: #4 PREEMPT Mon Aug 13 12:02:58 IST 2012
> MACHINE: armv7l (unknown Mhz)
> MEMORY: 462 MB
> PID: 2559
> COMMAND: "crash"
> TASK: d2b2b140 [THREAD_INFO: d170c000]
> CPU: 0
> STATE: TASK_RUNNING (ACTIVE)
> 
> crash>
> 
> 
> 
> 
> Regards,
> Oza.
> 
> 
> 
> 
> 
> 
> 
> From: Dave Anderson <anderson at redhat.com>
> To: paawan oza <paawan1982 at yahoo.com>
> Cc: "Discussion list for crash utility usage, maintenance and
> development" <crash-utility at redhat.com>
> Sent: Friday, 10 August 2012 10:01 PM
> Subject: Re: [Crash-utility] using crash for ARM
> 
> 
> 
> ----- Original Message -----
> > 
> > Hi,
> > 
> > please find the logs attached for crash -d8 vmlinux System.map.
> > 
> > crash -d8 vmlinux doesnt work. it gives
> > 
> > crash 6.0.8
> > Copyright (C) 2002-2012 Red Hat, Inc.
> > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> > Copyright (C) 1999-2006 Hewlett-Packard Co
> > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> > Copyright (C) 2005, 2011 NEC Corporation
> > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> > This program is free software, covered by the GNU General Public
> > License,
> > and you are welcome to change it and/or distribute copies of it
> > under
> > certain conditions. Enter "help copying" to see the conditions.
> > This program has absolutely no warranty. Enter "help warranty" for
> > details.
> > 
> > get_live_memory_source: /dev/mem
> > WARNING: ./vmlinux and /proc/version do not match!
> > 
> > WARNING: /proc/version indicates kernel version: 3.0.15+
> > 
> > crash: please use the vmlinux file for that kernel version, or try
> > using
> > the System.map for that kernel version as an additional argument.
> > 
> > Regards,
> > Oza.
> 
> For starters, as Mika suggested, you should try your best to use the
> actual vmlinux file that is being run on the live system. If the
> running
> kernel's vmlinux file does not have debuginfo data, and you are using
> a
> similar kernel along with the running kernel's System.map file, then
> you
> must be sure that the "other" vmlinux that you are using is as close
> as
> possible to the running kernel. There are no guarantees that using a
> System.map file will work.
> 
> Anyway, looking at the log file, I'm not sure why there's non-crash
> related
> data intermingled with the crash -d8 output, i.e., like this:
> 
> ...
> c00dfc08 clk_enable
> c00dfc50 clk_debug_set_enable
> c00dfcac clk_[ 1866.844757] ##> wifi_suspend
> [ 1866.856903] i2c i2c-1: mpu_dev_suspend, called regulator_disable.
> Status: 0
> [ 1866.856933] mpu_dev_suspend: Suspend handler executed
> [ 1866.872528] PM: suspend of devices complete after 27.886 msecs
> [ 1866.872558] PM: suspend devices took 0.030 seconds
> [ 1866.873077] PM: late suspend of devices complete after 0.457 msecs
> [ 1866.873535] PM: early resume of devices complete after 0.183 msecs
> [ 1866.874481] i2c i2c-1: mpu_dev_resume, called regulator_enable.
> Status: 0
> [ 1866.874511] mpu_dev_resume: Resume handler executed
> [ 1866.874572] wakeup wake lock: bcmpmu_i2c
> [ 1866.874908] get_update_rate: rate = 112000
> [ 1866.874938] get_update_rate: rate = 112000
> [ 1866.876434] ##> wifi_resume
> [ 1866.892822] PM: resume of devices complete after 19.007 msecs
> [ 1866.893676] PM: resume devices took 0.020 seconds
> reset
> c00dfd4c clk_debug_reset
> c00dfd90 clk_init
> c00dfe10 clk_register
> ...
> 
> Regardless of that, I was looking for a readmem() call, or other
> debug statement
> that might help pinpoint the failure location. The best that can be
> inferred from
> the log data are the GNU_GET_DATATYPE debug statements at the end:
> 
> $ grep GNU_GET_DATATYPE teraterm.log
> GNU_GET_DATATYPE[runqueue]: returned via gdb_error_hook (1 buffer in
> use)
> GNU_GET_DATATYPE[prio_array]: returned via gdb_error_hook (1 buffer
> in use)
> GNU_GET_DATATYPE[prio_array]: returned via gdb_error_hook (1 buffer
> in use)
> GNU_GET_DATATYPE[prio_array]: returned via gdb_error_hook (1 buffer
> in use)
> GNU_GET_DATATYPE[irq_desc_t]: returned via gdb_error_hook (1 buffer
> in use)
> GNU_GET_DATATYPE[hw_interrupt_type]: returned via gdb_error_hook (1
> buffer in use)
> GNU_GET_DATATYPE[timer_vec_root]: returned via gdb_error_hook (1
> buffer in use)
> GNU_GET_DATATYPE[timer_vec]: returned via gdb_error_hook (1 buffer in
> use)
> GNU_GET_DATATYPE[tvec_root_s]: returned via gdb_error_hook (1 buffer
> in use)
> GNU_GET_DATATYPE[softirq_state]: returned via gdb_error_hook (1
> buffer in use)
> GNU_GET_DATATYPE[desc_struct]: returned via gdb_error_hook (1 buffer
> in use)
> GNU_GET_DATATYPE[kallsyms_header]: returned via gdb_error_hook (1
> buffer in use)
> GNU_GET_DATATYPE[mem_section]: returned via gdb_error_hook (1 buffer
> in use)
> GNU_GET_DATATYPE[note_buf_t]: returned via gdb_error_hook (1 buffer
> in use)
> $
> 
> From those it's evident that you've successfully made it through
> kernel_init(),
> and have called machdep_init(POST_GDB) from here in
> main.c:main_loop()
> 
> } else if (!(pc->flags & MINIMAL_MODE)) {
> read_in_kernel_config(IKCFG_INIT);
> kernel_init();
> =====> machdep_init(POST_GDB);
> vm_init();
> machdep_init(POST_VM);
> module_init();
> help_init();
> task_init();
> vfs_init();
> net_init();
> dev_init();
> machdep_init(POST_INIT);
> }
> 
> which calls into arm.c:arm_init(POST_GDB). That function has
> successfully made
> it past the STRUCT_SIZE_INIT(note_buf, "note_buf_t") call:
> 
> /*
> * We need to have information about note_buf_t which is used to
> * hold ELF note containing registers and status of the thread
> * that panic'd.
> */
> =====> STRUCT_SIZE_INIT(note_buf, "note_buf_t");
> 
> STRUCT_SIZE_INIT(elf_prstatus, "elf_prstatus");
> MEMBER_OFFSET_INIT(elf_prstatus_pr_pid, "elf_prstatus",
> "pr_pid");
> MEMBER_OFFSET_INIT(elf_prstatus_pr_reg, "elf_prstatus",
> "pr_reg");
> break;
> 
> But the next STRUCT_SIZE_INIT() for "elf_prstatus" apparently never
> got completed.
> 
> In any case, it ended up in open_tmpfile2():
> 
> $ tail teraterm.log
> GETBUF(128 -> 0)
> FREEBUF(0)
> GETBUF(128 -> 0)
> FREEBUF(0)
> GETBUF(128 -> 0)
> FREEBUF(0)
> 
> crash: cannot open secondary temporary file
> 
> 1|shell at android:/system #
> $
> 
> 
> Although it's not clear how it's ending up in open_tmpfile2(),
> it's certainly of interest that the tmpfile() call is failing:
> 
> void
> open_tmpfile2(void)
> {
> if (pc->tmpfile2)
> error(FATAL, "recursive secondary temporary file usage\n");
> 
> if ((pc->tmpfile2 = tmpfile()) == NULL)
> error(FATAL, "cannot open secondary temporary file\n");
> 
> rewind(pc->tmpfile2);
> }
> 
> The man page for tmpfile() shows these reasons:
> 
> RETURN VALUE
> The tmpfile() function returns a stream descriptor, or NULL if a
> unique
> filename cannot be generated or the unique file cannot be opened. In
> the latter case, errno is set to indicate the error.
> 
> ERRORS
> EACCES Search permission denied for directory in fileâ€™s path
> prefix.
> 
> EEXIST Unable to generate a unique filename.
> 
> EINTR The call was interrupted by a signal.
> 
> EMFILE Too many file descriptors in use by the process.
> 
> ENFILE Too many files open in the system.
> 
> ENOSPC There was no room in the directory to add the new filename.
> 
> EROFS Read-only filesystem.
> 
> A couple things you might try:
> 
> (1) Put a perror() after the tmpfile() call to determine which errno
> is being returned.
> (2) Set "pc->flags |= DROP_CORE;" prior to the tmpfile() call.
> 
> Like this:
> 
> void
> open_tmpfile2(void)
> {
> if (pc->tmpfile2)
> error(FATAL, "recursive secondary temporary file usage\n");
> 
> + pc->flags |= DROP_CORE;
> - if ((pc->tmpfile2 = tmpfile()) == NULL)
> + if ((pc->tmpfile2 = tmpfile()) == NULL) {
> + perror("tmpfile");
> error(FATAL, "cannot open secondary temporary file\n");
> + }
> pc->flags &= ~DROP_CORE;
> 
> rewind(pc->tmpfile2);
> }
> 
> Then get a backtrace by running gdb on the resultant core file, or
> just
> run the whole session from gdb.
> 
> Dave
> 
> 
> 
> 
> 
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20120814/1249ca16/attachment.htm>