[Crash-utility] [PATCH] Display online cpus value in preference to kt->cpus

Luciano Chavez lnx1138 at linux.vnet.ibm.com
Mon Mar 8 17:28:14 UTC 2010


On Mon, 2010-03-08 at 09:49 -0500, Dave Anderson wrote:
> ----- "Dave Anderson" <anderson at redhat.com> wrote:
> 
> > ----- "Luciano Chavez" <lnx1138 at linux.vnet.ibm.com> wrote:
> > 
> > > Hi Dave,
> > > 
> > > Thinking about backward compatibility, would displaying "ONLINE CPUS"
> > > still seem OK for the case where kernel_init() finds the smp_num_cpus
> > > symbol (as for a 2.4 kernel)? Before there were the various cpu maps, I
> > > think smp_num_cpus was analogous to the possible cpus as opposed to
> > > online. I can see this requiring some thought as to what CPUS in the
> > > output means when you have various different maps now (online, possible,
> > > and present). That being said, it would be good to leave no doubt and
> > > explicitly state the count is for the present or online CPUS with the
> > > latter being my suggestion.
> > > 
> > > I forgot to mention that I suspect the problem I mentioned before would
> > > get stranger for POWER7 which offers 4 threads per core. I didn't have
> > > access to a POWER7 machine to see just what it would do if we tried
> > > disabling SMT as before but it follows the same pattern the count
> > > displayed would be way off from the online count.
> > 
> > I just ran through a bunch of stashed dumpfiles I have on hand, and
> > it gets even murkier when dealing with Xen or KVM kernels, because
> > as part of the post-crash shutdown (or forced dump), all but one of
> > the cpus may be taken "offline".  So even though there may be 4 vcpus,
> > and crash correctly shows 4 "CPUS", the cpu_online_map shows only one
> > cpu bit.  So if we went ahead and displayed a number based upon the
> > cpu_online_map, it would completely misleading.  Incorrect
> > actually...
> 
> You can always dump the possible/present/online map information with
> the "help -k" debug option. 
> 
> So for example, taking a 2.6.9-era (RHEL4) xen kernel that crashed 
> on vcpu 3 due to a NULL reference, the hypervisor made a callback to
> the other vcpus to shut them down prior to the core dumping procedure:
> 
> crash> help -k
> ...
>        cpu_possible_map: (does not exist)
>         cpu_present_map: 0 1 2 3 
>          cpu_online_map: 3 
> ...
> 
> So the online map cannot be used for the cpu count, and for that
> matter, it wouldn't make any sense to even display the online map
> count.
> 
> In any case, for now I prefer not to change things, at least for the
> other architectures.
> 
> That being said, I defer machine-specific items for ppc64, s390
> and s390x to the IBM maintainers, and to HP for ia64. (The ppc
> and alpha architectures have no active "maintainers" any more,
> so those arches are pretty much withering on the vine.)  
> 
> So if you want to do something specifically for ppc64, please
> re-post a patch for just that architecture. 
> 
> Dave
> 

Dave,

Thanks for taking a good look at all the many cases that would make a
general solution of using online cpu count messy. I originally did want
to make this change only applicable to ppc64. The thing was, only
ppc64_display_machine_stats() was possible to affect and to make the
value displayed consistent, changing display_sys_stats() and
dump_kernel_table() was necessary.

So, re-thinking this to be a ppc64 specific change to CPUS to be
displayed as the online count when possible and having everyone else do
what they do now, which is to display kt->cpus, I suggest the following:

1. Add a get_cpus_to_display as a machdep function
2. For ppc64, initialize machdep->get_cpus_to_display to
ppc64_get_cpus_to_display() which will attempt to use get_cpus_online()
or fallback to using kt->cpus
3. For all other architectures, have them initialize
machdep->get_cpus_to_display to generic_get_cpus_to_display() which
returns kt->cpus to maintain the status quo of the code as it is now
4. Replace kt->cpus in display_sys_stats() and dump_kernel_table() in
kernel.c to invoke machdep->get_cpus_to_display() when displaying CPUS

Let me know what you think. I think this solution allows for future
flexibility for other architectures if in the future they individually
need to change what they display for the cpu count.

regards,
-- 
Luciano Chavez <lnx1138 at linux.vnet.ibm.com>
IBM Linux Technology Center




More information about the Crash-utility mailing list