[Crash-utility] [RFC][PATCH]: crash aborts with cannot determine idle task

Chandru chandru at in.ibm.com
Tue Jun 9 11:02:12 UTC 2009


On Tuesday 09 June 2009 00:14:39 Dave Anderson wrote:
> 
> 
> The problem is this memset() statement, which makes no sense:
> 
> +void map_prstatus_array(void)
> +{
> +       void *nt_ptr;
> +       int i, j;
> +
> +       /* temporary buffer to hold the prstatus_percpu array */
> +       if ((nt_ptr = (void *)calloc(nd->num_prstatus_notes,
> +                               sizeof(void *))) == NULL)
> +               error(FATAL,
> +                  "cannot allocate a buffer to hold prstatus_percpu array\n");
> +
> +       memcpy((void *)nt_ptr, nd->nt_prstatus_percpu,
> +               nd->num_prstatus_notes * sizeof(void *));
> +       memset(nd->nt_prstatus_percpu, 0, nd->num_prstatus_notes);
> 
> ...because it zero's out the first few bytes (whatever the number of NT_PRSTATUS
> sections there are) of the first entry in the array.  So for example, here's
> a before-and-after of the contents of a kdump's nd->nt_prstatus_percpu[] array
> which has just 2 NT_PRSTATUS sections: 
> 
> before memset():
> 
>   1d9f5dc8 1d9f5f2c 0 0 0 0 0 0 0 0 0 0 0
> 
> after memset():
> 
>   1d9f0000 1d9f5f2c 0 0 0 0 0 0 0 0 0 0 0
> 
> And then depending upon whether the resultant virtual address actually exists
> in the crash utility's virtual address space, it craps out in get_netdump_panic_task()
> when it tries to access the faulty address.


Hi Dave,

Thanks a lot for catching the segfault issue and finding the root cause for it. Here
follows the updated patch taking in the suggestions from the review comments.

kdump installs NT_PRSTATUS notes into vmcore file only to the cpus that were
online at the time of crash. In such cases, while reading in the notes from the
dump file, we are unsure of the cpu to NT_PRSTATUS  mapping. The cpu
possible, present and online map is not available until cpu_maps_init() initializes
them. Hence we remap the prstatus pointer array to online cpus just after
a call to this function. 

Signed-off-by: Chandru Siddalingappa <chandru at linux.vnet.ibm.com>
Reviewed-by: Dave Anderson <anderson at redhat.com>
Cc: Haren Myneni <haren at us.ibm.com>
---
 
--- crash-4.0-8.10/ppc64.c.orig	2009-06-08 16:08:09.000000000 +0530
+++ crash-4.0-8.10/ppc64.c	2009-06-09 15:45:39.000000000 +0530
@@ -2407,13 +2407,16 @@ ppc64_paca_init(void)
 	if (!symbol_exists("paca"))
 		error(FATAL, "PPC64: Could not find 'paca' symbol\n");
 
-	if (cpu_map_addr("present"))
+	if (cpu_map_addr("possible"))
+		map = POSSIBLE;
+	else if (cpu_map_addr("present"))
 		map = PRESENT;
 	else if (cpu_map_addr("online"))
 		map = ONLINE;
 	else
-		error(FATAL, 
-		    "PPC64: cannot find 'cpu_present_map' or 'cpu_online_map' symbols\n");
+		error(FATAL,
+			"PPC64: cannot find 'cpu_possible_map' or\
+			'cpu_present_map' or 'cpu_online_map' symbols\n");
 
 	if (!MEMBER_EXISTS("paca_struct", "data_offset"))
 		return;
@@ -2423,8 +2426,8 @@ ppc64_paca_init(void)
 
 	cpu_paca_buf = GETBUF(SIZE(ppc64_paca));
 
-	if (!(nr_paca = get_array_length("paca", NULL, 0))) 
-		nr_paca = NR_CPUS;
+	if (!(nr_paca = get_array_length("paca", NULL, 0)))
+		nr_paca = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
 
 	if (nr_paca > NR_CPUS) {
 		error(WARNING, 
@@ -2435,7 +2438,7 @@ ppc64_paca_init(void)
 	
 	for (i = cpus = 0; i < nr_paca; i++) {
 		/*
-		 * CPU present (or online)?
+		 * CPU present or online or can exist in the system(possible)?
 		 */
 		if (!in_cpu_map(map, i))
 			continue;
--- crash-4.0-8.10/kernel.c.orig	2009-06-08 16:07:53.000000000 +0530
+++ crash-4.0-8.10/kernel.c	2009-06-09 15:01:51.000000000 +0530
@@ -74,6 +74,9 @@ kernel_init()
 
 	cpu_maps_init();
 
+	if (KDUMP_DUMPFILE())
+		map_cpu_prstatus();
+
 	kt->stext = symbol_value("_stext");
 	kt->etext = symbol_value("_etext");
 	get_text_init_space(); 
--- crash-4.0-8.10/netdump.c.orig	2009-06-08 16:07:58.000000000 +0530
+++ crash-4.0-8.10/netdump.c	2009-06-09 16:24:52.000000000 +0530
@@ -45,6 +45,38 @@ static void check_dumpfile_size(char *);
 	(machine_type("IA64") || machine_type("PPC64"))
 
 /*
+ * kdump installs NT_PRSTATUS elf notes only to the cpus
+ * that were online during dumping. Hence we call into
+ * this function after reading the cpu map from the kernel,
+ * to remap the NT_PRSTATUS notes only to the online cpus
+ */
+void map_cpu_prstatus(void)
+{
+	void *nt_ptr;
+	int i, j, nrcpus;
+
+	/* temporary buffer to hold the prstatus_percpu array */
+	if ((nt_ptr = (void *)calloc(nd->num_prstatus_notes,
+				sizeof(void *))) == NULL)
+		error(FATAL,
+		   "cannot allocate a buffer to hold prstatus_percpu array\n");
+
+	memcpy((void *)nt_ptr, nd->nt_prstatus_percpu,
+		(nd->num_prstatus_notes * sizeof(void *)));
+	memset(nd->nt_prstatus_percpu, 0,
+		(nd->num_prstatus_notes * sizeof(void *)));
+
+	nrcpus = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
+
+	/* re-populate the array with the notes mapping to online cpus */
+	for (i = 0, j = 0; i < nrcpus; i++)
+		if (in_cpu_map(ONLINE, i))
+			((unsigned long *)nd->nt_prstatus_percpu)[i] =
+				((unsigned long *)nt_ptr)[j++];
+	free(nt_ptr);
+}
+
+/*
  *  Determine whether a file is a netdump/diskdump/kdump creation, 
  *  and if TRUE, initialize the vmcore_data structure.
  */
@@ -618,7 +650,7 @@ get_netdump_panic_task(void)
 		crashing_cpu = -1;
 		if (kernel_symbol_exists("crashing_cpu")) {
 			get_symbol_data("crashing_cpu", sizeof(int), &i);
-			if ((i >= 0) && (i < nd->num_prstatus_notes)) {
+			if ((i >= 0) && in_cpu_map(ONLINE, i)) {
 				crashing_cpu = i;
 				if (CRASHDEBUG(1))
 					error(INFO, 
@@ -2236,7 +2268,7 @@ get_netdump_regs_ppc64(struct bt_info *b
 		 * CPUs if they responded to an IPI.
 		 */
                 if (nd->num_prstatus_notes > 1) {
-			if (bt->tc->processor >= nd->num_prstatus_notes)
+			if (!nd->nt_prstatus_percpu[bt->tc->processor])
 				error(FATAL, 
 		          	    "cannot determine NT_PRSTATUS ELF note "
 				    "for %s task: %lx\n", 




More information about the Crash-utility mailing list