[Crash-utility] [PATCH] Display online cpus value in preference to kt->cpus

Luciano Chavez lnx1138 at linux.vnet.ibm.com
Fri Mar 5 20:45:15 UTC 2010


On Fri, 2010-03-05 at 15:01 -0500, Dave Anderson wrote:
> ----- "Luciano Chavez" <lnx1138 at linux.vnet.ibm.com> wrote:
> 
> > Howdy,
> > 
> > crash 5.0.0 introduced a change to ppc64_paca_init() in ppc64.c to 
> > manipulate kt->cpus to fix a 4.0-8.11 regression when cpu_possible_map
> > has more cpus than cpu_online_map. The change basically adjusts kt->cpus
> > to the highest cpu index + 1. In situations where cpus from 0 through
> > highest index are all online, this will equal online cpus.
> > On IBM POWER based system supporting SMT, we can have it dynamically
> > enabled and disabled and so we can go from:
> > 
> > brucelp3:~ # ppc64_cpu --smt=on
> > brucelp3:~ # cat /sys/devices/system/cpu/online
> > 0-9
> > 
> > to
> > 
> > brucelp3:~ # ppc64_cpu --smt=off
> > brucelp3:~ # cat /sys/devices/system/cpu/online
> > 0,2,4,6,8
> > 
> > In this situation, the new code will determine that kt->cpus is 9. crash
> > will display:
> > 
> > KERNEL: /boot/vmlinux
> >     DUMPFILE: /dev/mem
> >         CPUS: 9   ===============> Rather than 5
> >         DATE: Fri Feb 26 06:06:51 2010
> >       UPTIME: 04:22:34
> > LOAD AVERAGE: 0.49, 0.14, 0.05
> >        TASKS: 320
> >     NODENAME: brucelp3
> >      RELEASE: 2.6.32.8-0.3-ppc64
> >      VERSION: #1 SMP 2010-02-22 16:22:25 +0100
> >      MACHINE: ppc64  (unknown Mhz)
> >       MEMORY: 1 GB
> >          PID: 19948
> >      COMMAND: "crash"
> >         TASK: c00000002433be50  [THREAD_INFO: c0000000238a0000]
> >          CPU: 2
> >        STATE: TASK_RUNNING (ACTIVE)
> > 
> > kernel_init() initially does come up with 5 for kt->cpus initially before
> > the machdep init routine (ppc64_paca_init) ends up changing it to 9 in
> > the above situation.
> > 
> > Because of the way other parts of the code seem to iterate, allowing kt->cpus
> > to get set to the number of online cpus (5) would make them not work properly
> > either. Case in point, the ps command. It would iterate through the first 5
> > cpus for the swapper tasks and stop, providing no information for swapper tasks
> > on online cpus 6 and 8. 
> > 
> > Rather than displaying kt->cpu for cpu count to display to users, can we call
> > get_cpus_online() itself. This will solve the problem of keeping kt->cpu
> > separate from get_cpus_online(). So commands like ps, set etc can still rely on
> > kt->cpu as set for each architecture. Additionally, we need to consider
> > providing the value as it exists after machine dependent init in case
> > the symbols required by get_cpus_online() are not available.
> > 
> > The patch provided creates a new routine called get_cpus_to_display() in 
> > kernel.c that simply attempts to retrieve the count of online CPUs with
> > get_cpus_online(). However, since the cpu_online_map symbol may not be
> > available with all kernels, the cpus_to_display() routine will
> > return the value for kt->cpus if get_cpus_online() return 0 for that
> > case to maintain backwards compatibility.
> > 
> > With this in mind, all locations (mainly the display_machine_stats() routines
> > for each architecture, display_sys_stats() and dump_kernel_table()) in the
> > source that would print the cpu count using kt->cpus now call get_cpus_to_display()
> > to obtain that value. 
> > 
> > This should hopefully provide the user with an expected CPU count regardless of
> > the internal manipulation that is sometimes done to the kt->cpus value.
> > 
> > regards,
> > -- 
> > Luciano Chavez <lnx1138 at linux.vnet.ibm.com>
> > IBM Linux Technology Center
> 
> Right -- I thought of tinkering with the initial system banner and
> the "sys" output to show the total cpus (highest+1) plus the online
> count, i.e., something like:
> 
> crash> sys
>       KERNEL: /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux
>     DUMPFILE: /dev/crash
>         CPUS: 4 (3 online)
>         DATE: Fri Mar  5 14:48:23 2010
>       UPTIME: 32 days, 06:20:13
> LOAD AVERAGE: 0.10, 0.18, 0.17
>        TASKS: 269
>     NODENAME: crash.usersys.redhat.com
>      RELEASE: 2.6.18-128.el5
>      VERSION: #1 SMP Wed Dec 17 11:41:38 EST 2008
>      MACHINE: x86_64  (1995 Mhz)
>       MEMORY: 1 GB
> crash>
> 
> But I decided against even doing that because, say, in the example
> above, there could be 8 cpus, with cpus 0,1,2 and 3 online.  In
> that case, the output would be "CPUS: 4 (4 online)" which would also
> be somewhat misleading.  In that case, the cpu_present_mask would 
> need to be consulted, but again, that's a fairly recent kernel addition.
> 
> So I left it purposely as it is now.  Yes the count shown may
> be more than what's online, but I kind of like the idea of having
> the "CPUS" count reflect what will be seen when running commands
> that cycle through all cpus.  Showing just the online count is 
> kind of misleading in that case.
> 
> I don't much care about the "mach" output though, and as a compromise,
> I have no problem changing those to show "ONLINE CPUS:" instead of
> just "CPUS".
> 
> Comments?
> 
> Dave
> 
> 

Hi Dave,

Thinking about backward compatibility, would displaying "ONLINE CPUS"
still seem OK for the case where kernel_init() finds the smp_num_cpus
symbol (as for a 2.4 kernel)? Before there were the various cpu maps, I
think smp_num_cpus was analogous to the possible cpus as opposed to
online. I can see this requiring some thought as to what CPUS in the
output means when you have various different maps now (online, possible,
and present). That being said, it would be good to leave no doubt and
explicitly state the count is for the present or online CPUS with the
latter being my suggestion.

I forgot to mention that I suspect the problem I mentioned before would
get stranger for POWER7 which offers 4 threads per core. I didn't have
access to a POWER7 machine to see just what it would do if we tried
disabling SMT as before but it follows the same pattern the count
displayed would be way off from the online count.

> > 
> > diff -up crash-5.0.1/alpha.c.orig crash-5.0.1/alpha.c
> > --- crash-5.0.1/alpha.c.orig	2010-03-04 07:19:23.000000000 -0600
> > +++ crash-5.0.1/alpha.c	2010-03-04 07:20:43.000000000 -0600
> > @@ -2641,7 +2641,7 @@ alpha_display_machine_stats(void)
> >  
> >          fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
> >          fprintf(fp, "        MEMORY SIZE: %s\n",
> > get_memory_size(buf));
> > -        fprintf(fp, "               CPUS: %d\n", kt->cpus);
> > +        fprintf(fp, "               CPUS: %d\n",
> > get_cpus_to_display());
> >          fprintf(fp, "    PROCESSOR SPEED: ");
> >          if ((mhz = machdep->processor_speed()))
> >                  fprintf(fp, "%ld Mhz\n", mhz);
> > diff -up crash-5.0.1/defs.h.orig crash-5.0.1/defs.h
> > --- crash-5.0.1/defs.h.orig	2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/defs.h	2010-03-04 07:20:43.000000000 -0600
> > @@ -3717,6 +3717,7 @@ int get_cpus_online(void);
> >  int get_cpus_present(void);
> >  int get_cpus_possible(void);
> >  int get_highest_cpu_online(void);
> > +int get_cpus_to_display(void);
> >  int in_cpu_map(int, int);
> >  void paravirt_init(void);
> >  void print_stack_text_syms(struct bt_info *, ulong, ulong);
> > diff -up crash-5.0.1/ia64.c.orig crash-5.0.1/ia64.c
> > --- crash-5.0.1/ia64.c.orig	2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/ia64.c	2010-03-04 07:20:43.000000000 -0600
> > @@ -2325,7 +2325,7 @@ ia64_display_machine_stats(void)
> >  
> >          fprintf(fp, "              MACHINE TYPE: %s\n",
> > uts->machine);
> >          fprintf(fp, "               MEMORY SIZE: %s\n",
> > get_memory_size(buf));
> > -        fprintf(fp, "                      CPUS: %d\n", kt->cpus);
> > +        fprintf(fp, "                      CPUS: %d\n",
> > get_cpus_to_display());
> >          fprintf(fp, "           PROCESSOR SPEED: ");
> >          if ((mhz = machdep->processor_speed()))
> >                  fprintf(fp, "%ld Mhz\n", mhz);
> > diff -up crash-5.0.1/kernel.c.orig crash-5.0.1/kernel.c
> > --- crash-5.0.1/kernel.c.orig	2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/kernel.c	2010-03-04 07:20:43.000000000 -0600
> > @@ -3871,7 +3871,7 @@ display_sys_stats(void)
> >  	}
> >  	
> >  
> > -        fprintf(fp, "        CPUS: %d\n", kt->cpus);
> > +        fprintf(fp, "        CPUS: %d\n", get_cpus_to_display());
> >  	if (ACTIVE())
> >          	get_symbol_data("xtime", sizeof(struct timespec),
> > &kt->date);
> >          fprintf(fp, "        DATE: %s\n", 
> > @@ -4289,7 +4289,7 @@ dump_kernel_table(int verbose)
> >          fprintf(fp, "    init_begin: %lx\n", kt->init_begin);
> >          fprintf(fp, "      init_end: %lx\n", kt->init_end);
> >          fprintf(fp, "           end: %lx\n", kt->end);
> > -        fprintf(fp, "          cpus: %d\n", kt->cpus);
> > +        fprintf(fp, "          cpus: %d\n", get_cpus_to_display());
> >          fprintf(fp, " cpus_override: %s\n", kt->cpus_override);
> >          fprintf(fp, "       NR_CPUS: %d (compiled-in to this version
> > of %s)\n",
> >  		NR_CPUS, pc->program_name); 
> > @@ -6257,6 +6257,16 @@ get_cpus_possible()
> >  }
> >  
> >  /*
> > + *  For when displaying cpus, return the number of cpus online if
> > possible, otherwise kt->cpus.
> > + */
> > +int
> > +get_cpus_to_display(void)
> > +{
> > +	int online = get_cpus_online();
> > +	return (online ? online : kt->cpus);
> > +}
> > +
> > +/*
> >   *  Xen machine-address to pseudo-physical-page translator.
> >   */ 
> >  ulonglong
> > diff -up crash-5.0.1/ppc64.c.orig crash-5.0.1/ppc64.c
> > --- crash-5.0.1/ppc64.c.orig	2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/ppc64.c	2010-03-04 07:20:43.000000000 -0600
> > @@ -2215,7 +2215,7 @@ ppc64_display_machine_stats(void)
> >  
> >          fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
> >          fprintf(fp, "        MEMORY SIZE: %s\n",
> > get_memory_size(buf));
> > -        fprintf(fp, "               CPUS: %d\n", kt->cpus);
> > +        fprintf(fp, "               CPUS: %d\n",
> > get_cpus_to_display());
> >          fprintf(fp, "    PROCESSOR SPEED: ");
> >          if ((mhz = machdep->processor_speed()))
> >                  fprintf(fp, "%ld Mhz\n", mhz);
> > diff -up crash-5.0.1/ppc.c.orig crash-5.0.1/ppc.c
> > --- crash-5.0.1/ppc.c.orig	2010-03-04 07:20:14.000000000 -0600
> > +++ crash-5.0.1/ppc.c	2010-03-04 07:20:43.000000000 -0600
> > @@ -1355,7 +1355,7 @@ ppc_display_machine_stats(void)
> >  
> >          fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
> >          fprintf(fp, "        MEMORY SIZE: %s\n",
> > get_memory_size(buf));
> > -        fprintf(fp, "               CPUS: %d\n", kt->cpus);
> > +        fprintf(fp, "               CPUS: %d\n",
> > get_cpus_to_display());
> >          fprintf(fp, "    PROCESSOR SPEED: ");
> >          if ((mhz = machdep->processor_speed()))
> >                  fprintf(fp, "%ld Mhz\n", mhz);
> > diff -up crash-5.0.1/s390.c.orig crash-5.0.1/s390.c
> > --- crash-5.0.1/s390.c.orig	2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/s390.c	2010-03-04 07:20:43.000000000 -0600
> > @@ -1032,7 +1032,7 @@ s390_display_machine_stats(void)
> >  
> >  	fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
> >  	fprintf(fp, "        MEMORY SIZE: %s\n", get_memory_size(buf));
> > -	fprintf(fp, "               CPUS: %d\n", kt->cpus);
> > +	fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
> >  	fprintf(fp, "    PROCESSOR SPEED: ");
> >  	if ((mhz = machdep->processor_speed()))
> >  		fprintf(fp, "%ld Mhz\n", mhz);
> > diff -up crash-5.0.1/s390x.c.orig crash-5.0.1/s390x.c
> > --- crash-5.0.1/s390x.c.orig	2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/s390x.c	2010-03-04 07:20:43.000000000 -0600
> > @@ -1284,7 +1284,7 @@ s390x_display_machine_stats(void)
> >  
> >  	fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
> >  	fprintf(fp, "        MEMORY SIZE: %s\n", get_memory_size(buf));
> > -	fprintf(fp, "               CPUS: %d\n", kt->cpus);
> > +	fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
> >  	fprintf(fp, "    PROCESSOR SPEED: ");
> >  	if ((mhz = machdep->processor_speed()))
> >  		fprintf(fp, "%ld Mhz\n", mhz);
> > diff -up crash-5.0.1/x86_64.c.orig crash-5.0.1/x86_64.c
> > --- crash-5.0.1/x86_64.c.orig	2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/x86_64.c	2010-03-04 07:20:43.000000000 -0600
> > @@ -4412,7 +4412,7 @@ x86_64_display_machine_stats(void)
> >  
> >          fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
> >          fprintf(fp, "        MEMORY SIZE: %s\n",
> > get_memory_size(buf));
> > -        fprintf(fp, "               CPUS: %d\n", kt->cpus);
> > +        fprintf(fp, "               CPUS: %d\n",
> > get_cpus_to_display());
> >          fprintf(fp, "    PROCESSOR SPEED: ");
> >          if ((mhz = machdep->processor_speed()))
> >                  fprintf(fp, "%ld Mhz\n", mhz);
> > diff -up crash-5.0.1/x86.c.orig crash-5.0.1/x86.c
> > --- crash-5.0.1/x86.c.orig	2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/x86.c	2010-03-04 07:20:43.000000000 -0600
> > @@ -3950,7 +3950,7 @@ x86_display_machine_stats(void)
> >  
> >          fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
> >          fprintf(fp, "        MEMORY SIZE: %s\n",
> > get_memory_size(buf));
> > -	fprintf(fp, "               CPUS: %d\n", kt->cpus);
> > +	fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
> >  	fprintf(fp, "    PROCESSOR SPEED: ");
> >  	if ((mhz = machdep->processor_speed())) 
> >  		fprintf(fp, "%ld Mhz\n", mhz);
> > 
> > 
> > --
> > Crash-utility mailing list
> > Crash-utility at redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
-- 
Luciano Chavez <lnx1138 at linux.vnet.ibm.com>
IBM Linux Technology Center




More information about the Crash-utility mailing list