[Crash-utility] [PATCH] Display online cpus value in preference to kt->cpus

Dave Anderson anderson at redhat.com
Mon Mar 8 14:49:29 UTC 2010


----- "Dave Anderson" <anderson at redhat.com> wrote:

> ----- "Luciano Chavez" <lnx1138 at linux.vnet.ibm.com> wrote:
> 
> > Hi Dave,
> > 
> > Thinking about backward compatibility, would displaying "ONLINE CPUS"
> > still seem OK for the case where kernel_init() finds the smp_num_cpus
> > symbol (as for a 2.4 kernel)? Before there were the various cpu maps, I
> > think smp_num_cpus was analogous to the possible cpus as opposed to
> > online. I can see this requiring some thought as to what CPUS in the
> > output means when you have various different maps now (online, possible,
> > and present). That being said, it would be good to leave no doubt and
> > explicitly state the count is for the present or online CPUS with the
> > latter being my suggestion.
> > 
> > I forgot to mention that I suspect the problem I mentioned before would
> > get stranger for POWER7 which offers 4 threads per core. I didn't have
> > access to a POWER7 machine to see just what it would do if we tried
> > disabling SMT as before but it follows the same pattern the count
> > displayed would be way off from the online count.
> 
> I just ran through a bunch of stashed dumpfiles I have on hand, and
> it gets even murkier when dealing with Xen or KVM kernels, because
> as part of the post-crash shutdown (or forced dump), all but one of
> the cpus may be taken "offline".  So even though there may be 4 vcpus,
> and crash correctly shows 4 "CPUS", the cpu_online_map shows only one
> cpu bit.  So if we went ahead and displayed a number based upon the
> cpu_online_map, it would completely misleading.  Incorrect
> actually...

You can always dump the possible/present/online map information with
the "help -k" debug option. 
 
So for example, taking a 2.6.9-era (RHEL4) xen kernel that crashed 
on vcpu 3 due to a NULL reference, the hypervisor made a callback to
the other vcpus to shut them down prior to the core dumping procedure:

crash> help -k
...
       cpu_possible_map: (does not exist)
        cpu_present_map: 0 1 2 3 
         cpu_online_map: 3 
...

So the online map cannot be used for the cpu count, and for that
matter, it wouldn't make any sense to even display the online map
count.

In any case, for now I prefer not to change things, at least for the
other architectures.

That being said, I defer machine-specific items for ppc64, s390
and s390x to the IBM maintainers, and to HP for ia64. (The ppc
and alpha architectures have no active "maintainers" any more,
so those arches are pretty much withering on the vine.)  

So if you want to do something specifically for ppc64, please
re-post a patch for just that architecture. 

Dave




More information about the Crash-utility mailing list