[Crash-utility] [RFC v2] Use register value in elf note NT_PRSTATUS to do backtrace

Dave Anderson anderson at redhat.com
Tue Apr 26 15:15:12 UTC 2011



----- Original Message -----
> Hi Dave and list,
> 
> I guess you still remember that a few weeks ago Wen has sent a patch which intends
> to use register values in NT_PRSTATUS notes for backtracing if no panic task was
> found. Thanks for your review and the suggestions are very useful.
> 
> However, Wen was busy with other work for these days, so I'll continue with this
> work and now the 2rd version patch is attached.
> 
> v2: Address review feedbacks.
> 1) Set up a flag in pc->flags2 and it's determined during the diskdump file init procedure.
> 2) Seperate code according to the that flag.
> 3) Also, we've done some tests on dumpfile of xen kernel and the trouble described
> in the previous mail was gone.
> 
> So we're looking forward to your feedback and if you still have any problems with
> it, please tell us.
> 
> Thanks,
> Wang Chao

Hello Wang,

I haven't had a chance to run a complete test of this latest patch,
but I do have several suggestions.

diskdump.c:

Put the nt_prstatus_percpu array pointer and the num_prstatus_notes
into the diskdump_data structure -- like you did with the notes_buf.
Make nt_prstatus_percpu a pointer that gets allocated and initialized
only if necessary -- because if it's not used, it's a waste of space.

In read_dump_header(), when dd->notes_buf is malloc()'d, malloc() the
dd->nt_prstatus_percpu array. 

And at the bottom of read_dump_header(), free both the dd->notes_buf
and the dd->nt_prstatus_percpu array if they exist, under the "err:"
label:

err:
   ...
   if (dd->notes_buf)
	free(dd->notes_buf);
   if (dd->nt_prstatus_percpu)
        free(dd->nt_prstatus_percpu;

In order to be able to debug/understand why things may fail, especially
due to the exclusion of prstatus notes of offline cpus, it would be 
very helpful to be able to see the dd->nt_prstatus pointers for 
each cpu.  Similar to what is done with the vmcoreinfo data, can 
you add a section to __diskdump_dump_header():

                if (dh->header_version >= 3) {
                        fprintf(fp, "   offset_vmcoreinfo: %lx\n",
                                (ulong)dd->sub_header_kdump->offset_vmcoreinfo);
                        fprintf(fp, "     size_vmcoreinfo: %lu\n",
                                dd->sub_header_kdump->size_vmcoreinfo);
                        if (dd->sub_header_kdump->offset_vmcoreinfo &&
                                dd->sub_header_kdump->size_vmcoreinfo) {
                                dump_vmcoreinfo(fp);
                        }
                }
                if (dh->header_version >= 4) {
                        fprintf(fp, "         offset_note: %lx\n",
                                (ulong)dd->sub_header_kdump->offset_note);
                        fprintf(fp, "           size_note: %lu\n",
                                dd->sub_header_kdump->size_note);
here ==================> 
                }

Call a function that dumps the file offset of each cpu's nt_prstatus data.  
Then, if necessary, any cpu's register set can be read with "rd -f".


netdump.c:

In get_netdump_regs_x86(), both of these settings are incorrect:

                user_regs = get_regs_from_note((char *)note, &ip, &sp);
     =====>     bt->flags |= BT_KDUMP_ELF_REGS;
                if (is_kernel_text(ip) &&
                    (((sp >= GET_STACKBASE(bt->task)) &&
                      (sp < GET_STACKTOP(bt->task))) ||
                    in_alternate_stack(bt->tc->processor, sp))) {
                        *eip = ip;
                        *esp = sp;
     =====>             bt->flags |= BT_KERNEL_SPACE;
                        return;
                }

Neither of these flags should be set there.  BT_KDUMP_ELF_REGS
is only applicable to x86_64, and BT_KERNEL_SPACE is only
appicable to kvmdump dumpfiles.

Finally, in the interest of paranoia, give the user the capability
of *not* using this facility.  In main.c, create a "--no_elf_notes"
option (similar to "--zero_excluded"), and have it set a NO_ELF_NOTES
bit in the globally-accessible "*diskdump_flags".  As an example, see
how ZERO_EXCLUDED is set, used, and accessed.  Then in read_dump_header()
skip the new ELF setup if that flag is set.  It will not be necessary
to go as creating a new environment variable like "zero_excluded", but
it would be nice to be able to restrict its use on the crash invocation
command line.
 
Thanks,
  Dave




More information about the Crash-utility mailing list