[Crash-utility] crash: cannot gather a stable task list via pid_hash (500 retries)

Dave Anderson anderson at redhat.com
Mon Mar 17 16:04:08 UTC 2008


Eugene Teo wrote:
> Hi Dave,
> 
> I tried to run crash on Fedora 8's kernel 2.6.24.3-12.fc8 x86_64, and
> it has errors that look like the following:
> 
> [...]
> crash: duplicate task in pid_hash: ffff81012f0811d0
> crash: duplicate task in pid_hash: ffff81012f0811d0
> crash: duplicate task in pid_hash: ffff81012f0811d0
> crash: duplicate task in pid_hash: ffff81012f0811d0
> crash: duplicate task in pid_hash: ffff81012f0811d0
> 
> crash: cannot gather a stable task list via pid_hash (500 retries)
> 
> I ran crash with -d7, and uploaded the log for debugging:
> http://hera.kernel.org/~eugeneteo/crash.log
> 
> Thanks,
> Eugene

Hi Eugene,

I can't reproduce this one -- here on a freshly-installed x86_64
running 2.6.24.3-12.fc8:

# crash

crash 4.0-6.1
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

       KERNEL: /usr/lib/debug/lib/modules/2.6.24.3-12.fc8/vmlinux
     DUMPFILE: /dev/crash
         CPUS: 8
         DATE: Mon Mar 17 11:40:59 2008
       UPTIME: 00:17:01
LOAD AVERAGE: 0.50, 0.18, 0.06
        TASKS: 169
     NODENAME: hp-dl585g2-01.rhts.boston.redhat.com
      RELEASE: 2.6.24.3-12.fc8
      VERSION: #1 SMP Tue Feb 26 14:21:30 EST 2008
      MACHINE: x86_64  (2812 Mhz)
       MEMORY: 8 GB
          PID: 2938
      COMMAND: "crash"
         TASK: ffff8102440f48b0  [THREAD_INFO: ffff81027d912000]
          CPU: 1
        STATE: TASK_RUNNING (ACTIVE)

crash>

 From the debug output, there's no useful info re: task ffff81012f0811d0
other than the fact that it's being seen as a duplicate in a pid_hash
chain, and that it doesn't appear to be a temporary "shifting-sands"
condition because 500 retries are not alleviating the condition.

But you can probably tinker with the task refresh function in the
crash utility to skip it, and then investigate why it's there twice.
The crash function should be refresh_hlist_task_table_v3():

crash> help -t | grep refresh
refresh_task_table: refresh_hlist_task_table_v3()
crash>

This piece here in refresh_hlist_task_table_v3() is what gets
retried a maximum of 500 times:

                 if (!is_idle_thread(next) && !hq_enter(next)) {
                         error(INFO, "%sduplicate task in pid_hash: %lx\n",
                                 DUMPFILE() ? "\n" : "", next);
                         if (DUMPFILE())
                                 break;
                         hq_close();
                         retries++;
                         goto retry_pid_hash;
                 }

Try modifying it to either "continue", or perhaps just "break".
If that's the only irregularity found, you should be able to
get a "crash>" prompt; and then you can look at task ffff81012f0811d0,
because the first instance found in the pid_hash chain should be
listed by "ps" as a task.

Dave










More information about the Crash-utility mailing list