[Crash-utility] [PATCH] Display online cpus value in preference to kt->cpus

Luciano Chavez lnx1138 at linux.vnet.ibm.com
Fri Mar 5 19:31:21 UTC 2010


Howdy,

crash 5.0.0 introduced a change to ppc64_paca_init() in ppc64.c to 
manipulate kt->cpus to fix a 4.0-8.11 regression when 
cpu_possible_map has more cpus than cpu_online_map. The change basically
adjusts kt->cpus to the highest cpu index + 1. In situations where cpus
from 0 through highest index are all online, this will equal online cpus.
On IBM POWER based system supporting SMT, we can have it dynamically
enabled and disabled and so we can go from:

brucelp3:~ # ppc64_cpu --smt=on
brucelp3:~ # cat /sys/devices/system/cpu/online
0-9

to

brucelp3:~ # ppc64_cpu --smt=off
brucelp3:~ # cat /sys/devices/system/cpu/online
0,2,4,6,8

In this situation, the new code will determine that kt->cpus is 9. crash
will display:

KERNEL: /boot/vmlinux
    DUMPFILE: /dev/mem
        CPUS: 9   ===============> Rather than 5
        DATE: Fri Feb 26 06:06:51 2010
      UPTIME: 04:22:34
LOAD AVERAGE: 0.49, 0.14, 0.05
       TASKS: 320
    NODENAME: brucelp3
     RELEASE: 2.6.32.8-0.3-ppc64
     VERSION: #1 SMP 2010-02-22 16:22:25 +0100
     MACHINE: ppc64  (unknown Mhz)
      MEMORY: 1 GB
         PID: 19948
     COMMAND: "crash"
        TASK: c00000002433be50  [THREAD_INFO: c0000000238a0000]
         CPU: 2
       STATE: TASK_RUNNING (ACTIVE)

kernel_init() initially does come up with 5 for kt->cpus initially before
the machdep init routine (ppc64_paca_init) ends up changing it to 9 in
the above situation.

Because of the way other parts of the code seem to iterate, allowing kt->cpus
to get set to the number of online cpus (5) would make them not work properly
either. Case in point, the ps command. It would iterate through the first 5
cpus for the swapper tasks and stop, providing no information for swapper tasks
on online cpus 6 and 8. 

Rather than displaying kt->cpu for cpu count to display to users, can we call
get_cpus_online() itself. This will solve the problem of keeping kt->cpu
separate from get_cpus_online(). So commands like ps, set etc can still rely on
kt->cpu as set for each architecture. Additionally, we need to consider
providing the value as it exists after machine dependent init in case 
the symbols required by get_cpus_online() are not available.

The patch provided creates a new routine called get_cpus_to_display() in 
kernel.c that simply attempts to retrieve the count of
online CPUs with get_cpus_online(). However, since the cpu_online_map symbol
may not be available with all kernels, the cpus_to_display() routine will
return the value for kt->cpus if get_cpus_online() return 0 for that case
to maintain backwards compatibility.

With this in mind, all locations (mainly the display_machine_stats() routines
for each architecture, display_sys_stats() and dump_kernel_table()) in the
source that would print the cpu count using kt->cpus now call
get_cpus_to_display() to obtain that value.

This should hopefully provide the user with an expected CPU count regardless of
the internal manipulation that is sometimes done to the kt->cpus value.

regards,
-- 
Luciano Chavez <lnx1138 at linux.vnet.ibm.com>
IBM Linux Technology Center

diff -up crash-5.0.1/alpha.c.orig crash-5.0.1/alpha.c
--- crash-5.0.1/alpha.c.orig	2010-03-04 07:19:23.000000000 -0600
+++ crash-5.0.1/alpha.c	2010-03-04 07:20:43.000000000 -0600
@@ -2641,7 +2641,7 @@ alpha_display_machine_stats(void)
 
         fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
         fprintf(fp, "        MEMORY SIZE: %s\n", get_memory_size(buf));
-        fprintf(fp, "               CPUS: %d\n", kt->cpus);
+        fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
         fprintf(fp, "    PROCESSOR SPEED: ");
         if ((mhz = machdep->processor_speed()))
                 fprintf(fp, "%ld Mhz\n", mhz);
diff -up crash-5.0.1/defs.h.orig crash-5.0.1/defs.h
--- crash-5.0.1/defs.h.orig	2010-02-17 15:21:24.000000000 -0600
+++ crash-5.0.1/defs.h	2010-03-04 07:20:43.000000000 -0600
@@ -3717,6 +3717,7 @@ int get_cpus_online(void);
 int get_cpus_present(void);
 int get_cpus_possible(void);
 int get_highest_cpu_online(void);
+int get_cpus_to_display(void);
 int in_cpu_map(int, int);
 void paravirt_init(void);
 void print_stack_text_syms(struct bt_info *, ulong, ulong);
diff -up crash-5.0.1/ia64.c.orig crash-5.0.1/ia64.c
--- crash-5.0.1/ia64.c.orig	2010-02-17 15:21:24.000000000 -0600
+++ crash-5.0.1/ia64.c	2010-03-04 07:20:43.000000000 -0600
@@ -2325,7 +2325,7 @@ ia64_display_machine_stats(void)
 
         fprintf(fp, "              MACHINE TYPE: %s\n", uts->machine);
         fprintf(fp, "               MEMORY SIZE: %s\n", get_memory_size(buf));
-        fprintf(fp, "                      CPUS: %d\n", kt->cpus);
+        fprintf(fp, "                      CPUS: %d\n", get_cpus_to_display());
         fprintf(fp, "           PROCESSOR SPEED: ");
         if ((mhz = machdep->processor_speed()))
                 fprintf(fp, "%ld Mhz\n", mhz);
diff -up crash-5.0.1/kernel.c.orig crash-5.0.1/kernel.c
--- crash-5.0.1/kernel.c.orig	2010-02-17 15:21:24.000000000 -0600
+++ crash-5.0.1/kernel.c	2010-03-04 07:20:43.000000000 -0600
@@ -3871,7 +3871,7 @@ display_sys_stats(void)
 	}
 	
 
-        fprintf(fp, "        CPUS: %d\n", kt->cpus);
+        fprintf(fp, "        CPUS: %d\n", get_cpus_to_display());
 	if (ACTIVE())
         	get_symbol_data("xtime", sizeof(struct timespec), &kt->date);
         fprintf(fp, "        DATE: %s\n", 
@@ -4289,7 +4289,7 @@ dump_kernel_table(int verbose)
         fprintf(fp, "    init_begin: %lx\n", kt->init_begin);
         fprintf(fp, "      init_end: %lx\n", kt->init_end);
         fprintf(fp, "           end: %lx\n", kt->end);
-        fprintf(fp, "          cpus: %d\n", kt->cpus);
+        fprintf(fp, "          cpus: %d\n", get_cpus_to_display());
         fprintf(fp, " cpus_override: %s\n", kt->cpus_override);
         fprintf(fp, "       NR_CPUS: %d (compiled-in to this version of %s)\n",
 		NR_CPUS, pc->program_name); 
@@ -6257,6 +6257,16 @@ get_cpus_possible()
 }
 
 /*
+ *  For when displaying cpus, return the number of cpus online if possible, otherwise kt->cpus.
+ */
+int
+get_cpus_to_display(void)
+{
+	int online = get_cpus_online();
+	return (online ? online : kt->cpus);
+}
+
+/*
  *  Xen machine-address to pseudo-physical-page translator.
  */ 
 ulonglong
diff -up crash-5.0.1/ppc64.c.orig crash-5.0.1/ppc64.c
--- crash-5.0.1/ppc64.c.orig	2010-02-17 15:21:24.000000000 -0600
+++ crash-5.0.1/ppc64.c	2010-03-04 07:20:43.000000000 -0600
@@ -2215,7 +2215,7 @@ ppc64_display_machine_stats(void)
 
         fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
         fprintf(fp, "        MEMORY SIZE: %s\n", get_memory_size(buf));
-        fprintf(fp, "               CPUS: %d\n", kt->cpus);
+        fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
         fprintf(fp, "    PROCESSOR SPEED: ");
         if ((mhz = machdep->processor_speed()))
                 fprintf(fp, "%ld Mhz\n", mhz);
diff -up crash-5.0.1/ppc.c.orig crash-5.0.1/ppc.c
--- crash-5.0.1/ppc.c.orig	2010-03-04 07:20:14.000000000 -0600
+++ crash-5.0.1/ppc.c	2010-03-04 07:20:43.000000000 -0600
@@ -1355,7 +1355,7 @@ ppc_display_machine_stats(void)
 
         fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
         fprintf(fp, "        MEMORY SIZE: %s\n", get_memory_size(buf));
-        fprintf(fp, "               CPUS: %d\n", kt->cpus);
+        fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
         fprintf(fp, "    PROCESSOR SPEED: ");
         if ((mhz = machdep->processor_speed()))
                 fprintf(fp, "%ld Mhz\n", mhz);
diff -up crash-5.0.1/s390.c.orig crash-5.0.1/s390.c
--- crash-5.0.1/s390.c.orig	2010-02-17 15:21:24.000000000 -0600
+++ crash-5.0.1/s390.c	2010-03-04 07:20:43.000000000 -0600
@@ -1032,7 +1032,7 @@ s390_display_machine_stats(void)
 
 	fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
 	fprintf(fp, "        MEMORY SIZE: %s\n", get_memory_size(buf));
-	fprintf(fp, "               CPUS: %d\n", kt->cpus);
+	fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
 	fprintf(fp, "    PROCESSOR SPEED: ");
 	if ((mhz = machdep->processor_speed()))
 		fprintf(fp, "%ld Mhz\n", mhz);
diff -up crash-5.0.1/s390x.c.orig crash-5.0.1/s390x.c
--- crash-5.0.1/s390x.c.orig	2010-02-17 15:21:24.000000000 -0600
+++ crash-5.0.1/s390x.c	2010-03-04 07:20:43.000000000 -0600
@@ -1284,7 +1284,7 @@ s390x_display_machine_stats(void)
 
 	fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
 	fprintf(fp, "        MEMORY SIZE: %s\n", get_memory_size(buf));
-	fprintf(fp, "               CPUS: %d\n", kt->cpus);
+	fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
 	fprintf(fp, "    PROCESSOR SPEED: ");
 	if ((mhz = machdep->processor_speed()))
 		fprintf(fp, "%ld Mhz\n", mhz);
diff -up crash-5.0.1/x86_64.c.orig crash-5.0.1/x86_64.c
--- crash-5.0.1/x86_64.c.orig	2010-02-17 15:21:24.000000000 -0600
+++ crash-5.0.1/x86_64.c	2010-03-04 07:20:43.000000000 -0600
@@ -4412,7 +4412,7 @@ x86_64_display_machine_stats(void)
 
         fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
         fprintf(fp, "        MEMORY SIZE: %s\n", get_memory_size(buf));
-        fprintf(fp, "               CPUS: %d\n", kt->cpus);
+        fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
         fprintf(fp, "    PROCESSOR SPEED: ");
         if ((mhz = machdep->processor_speed()))
                 fprintf(fp, "%ld Mhz\n", mhz);
diff -up crash-5.0.1/x86.c.orig crash-5.0.1/x86.c
--- crash-5.0.1/x86.c.orig	2010-02-17 15:21:24.000000000 -0600
+++ crash-5.0.1/x86.c	2010-03-04 07:20:43.000000000 -0600
@@ -3950,7 +3950,7 @@ x86_display_machine_stats(void)
 
         fprintf(fp, "       MACHINE TYPE: %s\n", uts->machine);
         fprintf(fp, "        MEMORY SIZE: %s\n", get_memory_size(buf));
-	fprintf(fp, "               CPUS: %d\n", kt->cpus);
+	fprintf(fp, "               CPUS: %d\n", get_cpus_to_display());
 	fprintf(fp, "    PROCESSOR SPEED: ");
 	if ((mhz = machdep->processor_speed())) 
 		fprintf(fp, "%ld Mhz\n", mhz);





More information about the Crash-utility mailing list