[Crash-utility] Re: crash: cannot gather a stable task list via pid_hash (500 retries)

Tue Mar 18 16:01:20 UTC 2008

Eugene,

Another debugging aid you can try already exists in the task-gathering
function refresh_hlist_task_table_v3(), which does this before
making the "duplicate task" check:

     if (CRASHDEBUG(1)) {
             if (chained)
                     console("                %lx upid: %lx nr: %d pid: %lx\n"
                             pnext/pprev: %.*lx/%lx task: %lx\n",
                             kpp, upid, upid_nr, pid, VADDR_PRLEN, pnext, pprev, next);
             else
                     console("pid_hash[%4d]: %lx upid: %lx nr: %d pid: %lx\n"
                             "                pnext/pprev: %.*lx/%lx task: %lx\n",
                             i, kpp, upid, upid_nr, pid, VADDR_PRLEN, pnext, pprev, next);
     }

The console() function debug output, however, is a no-op
unless you first set up the "console" environment variable
with a tty name.  Open another window on the system you're
running on, get its tty filename, and put it in a .crashrc
file located either in the current directory or home directory:

set console /dev/pts/<whatever>

You can set it to the same window as you're running the
crash session if you want -- the main reason the console()
function exists is to print often very verbose debug output
without trashing crash command output, so it allows you to
redirect it to another window.

Anyway, having done that, invoke "crash -d1", and you
should see output like this on the selected console
window, showing the task(s) found by walking each
in-use pid_hash[x] hlist_head:

...
pid_hash[  11]: ffff81027c8f9048 upid: ffff81027c8f9038 nr: 666 pid: ffff81027c8f9000
                 pnext/pprev: 0000000000000000/ffff81000105b5d8 task: ffff81027c934000
pid_hash[  19]: ffff81027f8012c8 upid: ffff81027f8012b8 nr: 1 pid: ffff81027f801280
                 pnext/pprev: 0000000000000000/ffff81000105b618 task: ffff81027ec5e000
pid_hash[  27]: ffff81027cc2c748 upid: ffff81027cc2c738 nr: 378 pid: ffff81027cc2c700
                 pnext/pprev: 0000000000000000/ffff81000105b658 task: ffff81027cc9a000
pid_hash[  74]: ffff81027ec589c8 upid: ffff81027ec589b8 nr: 35 pid: ffff81027ec58980
                 pnext/pprev: 0000000000000000/ffff81000105b7d0 task: ffff81027ee01160
pid_hash[ 119]: ffff81027cc35948 upid: ffff81027cc35938 nr: 2297 pid: ffff81027cc35900
                 pnext/pprev: 0000000000000000/ffff81000105b938 task: ffff81027d825160
...

Typically there's only one task on any pid_hash chain,
but if there's more than one, it will look like this
two-task example:

pid_hash[2920]: ffff81027d1b7c48 upid: ffff81027d1b7c38 nr: 528 pid: ffff81027d1b7c00
                 pnext/pprev: ffff81027f801748/ffff8100010610c0 task: ffff81027eef48b0
                 ffff81027f801748 upid: ffff81027f801738 nr: 7 pid: ffff81027f801700
                 pnext/pprev: 0000000000000000/ffff81027d1b7c48 task: ffff81027ec72000

Presumably in your case, (if you can reproduce it) there would have
been a pid_hash chain that contains ffff81012f0811d0 twice.  Your debug
output is going to be extremely verbose, because you will see the
pid_hash output repeating itself 500 times -- but it will stop at the
pid_hash[index] where it found the duplicate entry.

I'm curious why you are seeing this.  This pid_hash/retry scheme has
been in place forever, and I've never seen a legitimate/persistent
duplicate task error.

Thanks,
   Dave