[Crash-utility] Handle the NT_PRSTATUS lost for the "bt" command

Wed Jun 20 07:36:05 UTC 2012

(2012/06/20 3:25), Dave Anderson wrote:
> 
> 
> ----- Original Message -----
> 
>>
>> OK, now I'm getting confused...
>>
> 
> The more I look at this patch, the more confused I get...
> 
> During initialization, the ELF notes contained in the
> dumpfile file header are scanned, and if an NT_PRSTATUS
> note is seen, a pointer to its location in the dumpfile
> is saved in dd->nt_prstatus_percpu[num] and the "num"
> of valid notes is kept in dd->num_prstatus_notes.
> 
> If the dd->num_prstatus_notes is equal to the online cpu
> count, then it is presumed that there is a one-to-one
> relationship, where the cpu number can be used as the
> index into the dd->nt_prstatus_percpu[num] array.
> 
> If the number of notes is not equal to the number of online
> cpus, then the "mapping" function is called, where if a
> cpu is found to be offline, then its (incorrectly) associated
> entry in the dd->nt_prstatus_percpu[num] array is "pushed up"
> to the next higher entry.  But the dd->num_prstatus_notes
> does not seem to get incremented to reflect that move, so
> then it's seems like diskdump_get_prstatus_percpu() can
> possibly return NULL when there actually is a relevant
> NT_PRSTATUS note.

A dd->num_prstatus_notes is equal to number of NT_PRSTATUS notes in dump file
and unless nt_prstatus_percpu[] is continuous from 0,
diskdump_get_prstatus_percpu() can possibly return NULL.

> That seems to be a bug (?), but it's not particularly important,
> because for x86 and x86_64, the data in the NT_PRSTATUS notes is
> only used if the starting point for backtraces if the PC/SP pair
> cannot be determined otherwise, which is the case virtually all of
> the time.  So the registers found in the NT_PRSTATUS notes are
> pretty much useless...

I seem to be minor bug, if some of cpus are offline while panic,
can't backtrace the same number of cpus from tail because of returning NULL.
I think current task's PC/SP can not obtain from thread_struct corectly,
then NT_PRSTATUS notes need to be stored for them instead.
But anyway, panic under offline is very unlikely condition
as same as the NMI failed.

> Now, to complicate matters, your patch does not look at the
> NT_PRSTATUS notes in the dumpfile header, but instead looks
> at the base kernel's original notes, and verifies their
> contents, and correlates what's found there against what was
> found in the dumpfile?  So I don't understand what you are
> attempting to do -- what is the difference between the notes
> that are copied into the dumpfile vs. what you are looking at
> in the base kernel?

At first, my patch is trying to find out tainted note area in base kernel.
Such notes can not be found out from dumpfile, however, found out other ones
are set continuously into nt_prstatus_percpu[from 0 to found out notes].
I just want to repaire nt_prstatus_percpu[#index] corresponding to cpu#index.
[example]
    cpu#0      cpu#1      cpu#2      cpu#3
    saved      tainted    saved      tainted  [by looking at base kernel]

=> nt_prstatus_percpu[0] = cpu#0's note, nt_prstatus_percpu[1] = cpu#3's note
I want to repaire array like,
=> nt_prstatus_percpu[0] = cpu#0's note, nt_prstatus_percpu[3] = cpu#3's note

Since ELF note format dose not contain information which cpu saved me,
I think there is no way to identify mapping between cpu and 
nt_prstatus_percpu[] from ELF note when both are not equal to.
That's why I try to resolve from base kernel area.

> I'm also wondering what would happen in your case if there
> were a combination of "lost" notes *and* offline cpus?  How
> would that work?

I think offline cpus never ack IPI, thus lost notes everytime(not concerned).
I only assume the case that cpu was online but could not ack IPI caused
by any reasons in interrupts off.

> So at this point I really don't want to add this patch
> at all because it touches common code, and I don't want to
> risk breaking the other arches.  Nobody has ever reported
> any "lost" cpus so far, probably because the kdump facility
> uses non-maskable NMI's to shutdown the non-panicking cpus.
> This is such a highly-unlikely corner case, that it does
> not even seem worth addressing for fear of breaking something
> else.

I can agree, x86 architecture own NMI which is most reliable way to 
deliver interrupts to other cpus while powerpc may not own NMI.
I am going to rework patch so that it can be effective only in ppc arch.

> I didn't look at the reasoning behind why you ran into a
> segmentation violation, but since the PPC code path would be:
> 
>   ...
>     back_trace()
>      get_diskdump_regs()
>        get_diskdump_regs_ppc()

get_diskdump_regs_ppc() is touching dd->nt_prstatus_percpu[bt->tc->processor]
when "bt" for panic_task.
---------
        if (KDUMP_CMPRS_VALID() &&
                (bt->task == tt->panic_task ||
                (is_task_active(bt->task) && (dd->num_prstatus_notes > 1)))) {
                note  = (Elf32_Nhdr*) dd->nt_prstatus_percpu[bt->tc->processor];
                if (!note)
                        error(FATAL,
                                    "cannot determine NT_PRSTATUS ELF note "
                                    "for %s task: %lx\n",
                                        (bt->task == tt->panic_task) ?
                                        "panic" : "active", bt->task);
                len = sizeof(Elf32_Nhdr);
                len = roundup(len + note->n_namesz, 4);
                bt->machdep = (void *)((char *)note + len +
                        MEMBER_OFFSET("elf_prstatus", "pr_reg"));
        }

        machdep->get_stack_frame(bt, eip, esp);
---------

In my dumpfile case, bt->tc->processor is 2 and its note is contained at 
dd->nt_prstatus_percpu[0].
                note  = (Elf32_Nhdr*) dd->nt_prstatus_percpu[2];
                * Here is no note instance which "bt" expects.

A dd->nt_prstatus_percpu[] is allocated by malloc() and not initialied with 0.
When note->n_namesz is executed, it is cause of segmentation violation.
This malloc() is moved to calloc() in my first patch.

Other problem which is also appeared in get_netdump_regs_ppc().
A dd->num_prstatus_notes is 1 in my dumpfile although there were 8 online cpus.
If next condition "is_task_active(bt->task)" is required by "bt",
get_diskdump_regs_ppc() unfortunately do machdep->get_stack_frame() for
active tasks and surely failed.
"cannot determine NT_PRSTATUS ELF note ..." should be proper.

> perhaps you can rework your patch so that it is segregated
> to PPC only?

Yes I'm going to rework, and thanks for your reviews.
I'm modifying difficult parts this time, very helpful.

I'll do more tests before sending reworked PPC only patch.
I also have to study about kexectools or makedumpfile more about
how NT_PRSTATUS or others are treated, haven't understood enough.

Toshi

> Dave
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
>