[Crash-utility] [RFC][PATCH]: crash aborts with cannot determine idle task

Chandru chandru at in.ibm.com
Wed Jun 10 08:26:32 UTC 2009


Dave Anderson wrote:
> Sorry -- that's not what I meant...  
>
> What I want to avoid is screwing around with the prstatus notes bookkeeping
> unless it is absolutely necessary, i.e., where there had been some cpus offlined
> prior to the crash.  The original thread back in April 2008 mentioned something
> to the effect that your test system only had cpus 12 and 13 online at the time of
> the crash.  When that is the case, is kt->cpus equal to 14?  I.e., what
> does the "sys" command show for "CPUS:"?
>
>   
I had the vmcore file from that test system and ran crash with -d1.
The cpu maps shown are ...

cpu_possible_map: 0 1 2 3 4 5 6 7 8 9 10 11 12 13
cpu_present_map: 8 9 10 11 12 13                 
cpu_online_map: 12 13

The 'sys' command shows CPUS as '14' (with the patch applied)

<snip>

crash> sys
      KERNEL: ./vmlinux
    DUMPFILE: ./vmcore
        CPUS: 14
        DATE: Tue Mar 25 14:43:39 2008
      UPTIME: 08:01:57
LOAD AVERAGE: 12.73, 5.40, 3.72
       TASKS: 262


I had another vmcore collected on another system by offlining couple of
cpus through sysfs interface. The cpu maps on this machine with 'crash 
-d1' show...

cpu_possible_map: 0 1 2 3                                             
cpu_present_map: 0 1 2 3                                              
cpu_online_map: 2 3

and 'sys' shows as

<snip>
crash> sys
      KERNEL: ./vmlinux
    DUMPFILE: ./vmcore
        CPUS: 4
        DATE: Sat Jun  6 15:00:24 2009
      UPTIME: 15:56:30


> I ask because this is the way I'd prefer to go:
>
> void
> map_cpus_to_prstatus(void)
> {
>         void **nt_ptr;
>         int online, i, j, nrcpus;
>         size_t size;
>
>         if (!(online = get_cpus_online()) || (online == kt->cpus))
>                 return;
>
>         if (CRASHDEBUG(1))
>                 error(INFO,
>                     "cpus: %d online: %d NT_PRSTATUS notes: %d (remapping)\n",
>                         kt->cpus, online, nd->num_prstatus_notes);
>
>         size = NR_CPUS * sizeof(void *);
>
>         nt_ptr = (void **)GETBUF(size);
>         BCOPY(nd->nt_prstatus_percpu, nt_ptr, size);
>         BZERO(nd->nt_prstatus_percpu, size);
>
>         /*
>          *  Re-populate the array with the notes mapping to online cpus
>          */
>         nrcpus = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
>
>         for (i = 0, j = 0; i < nrcpus; i++) {
>                 if (in_cpu_map(ONLINE, i))
>                         nd->nt_prstatus_percpu[i] = nt_ptr[j++];
>         }
>
>         FREEBUF(nt_ptr);
> }
>
> And since kt->cpus may not be finally initialized until later than
> kernel_init(), I moved the call to map_cpus_to_prstatus() to here
> in task_init():
>
>         if (ACTIVE()) {
>                 active_pid = REMOTE() ? pc->server_pid : pc->program_pid;
>                 set_context(NO_TASK, active_pid);
>                 tt->this_task = pid_to_task(active_pid);
>         }
>         else {
>                 if (KDUMP_DUMPFILE())
>                         map_cpus_to_prstatus();
>                 please_wait("determining panic task");
>                 set_context(get_panic_context(), NO_PID);
>                 please_wait_done();
>         }
>
> Can you test the map_cpus_to_prstatus() function above, along with the
> movement of the call to it from kernel_init() to task_init()?
>
>
>   
Yes, I tested these changes and they work fine.

Thanks,
Chandru




More information about the Crash-utility mailing list