[Crash-utility] Handle the NT_PRSTATUS lost for the "bt" command
Toshikazu Nakayama
nakayama.ts at ncos.nec.co.jp
Wed Jun 20 00:31:36 UTC 2012
(2012/06/20 1:20), Dave Anderson wrote:
> OK, now I'm getting confused...
>
> I note that currently __diskdump_memory_dump() does this:
>
> for (i = 0; i< dd->num_prstatus_notes; i++) {
> fprintf(fp, " notes[%d]: %lx\n",
> i, (ulong)dd->nt_prstatus_percpu[i]);
> }
>
> But your patch does this:
>
> nrcpus = kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS;
>
> for (i = 0; i< nrcpus; i++) {
> if (dd->nt_prstatus_percpu[i] == NULL) {
> fprintf(fp,
> " notes[%d]: (not saved)\n",
> i);
> continue;
> }
> fprintf(fp, " notes[%d]: %lx\n",
> i, (ulong)dd->nt_prstatus_percpu[i]);
> }
>
> Now surely we don't want to dump "NR_CPUS" notes pointers,
> unless there actually are that many cpus in the system.
> For example, in RHEL6 sets CONFIG_NR_CPUS to 4096 for x86_64,
> but rarely do we see a dumpfile with that many cpus.
> So I would prefer that the for loop continue to be bounded
> by "dd->num_prstatus_notes".
I also bothered about condition whether loop should break
at dd->nt_prstatus_percpu or not, even though remaining online cpu's
notes can't display as "not saved".
However, I choised nrcpus way since my environment could break
with online cpu nums, not NR_CPUS.
Now, I check kernel_NR_CPUS validation.
if (STREQ(ln, "CONFIG_NR_CPUS")) {
kt->kernel_NR_CPUS = atoi(val);
I'm aware if ikconfig is not valid, break condition of loop is always
NR_CPUS without doubt and display gets noisy like as your counsel.
> However, given that process_elf32_notes() and process_elf64_notes()
> do a cursory test of each note by checking for the NT_PRSTATUS
> type, I'm presuming that in your dumpfile, dd->num_prstatus_notes
> must be equal to 1, correct?
Yes, dd->num_prstatus_notes in my dumpfile is 1.
> void
> process_elf64_notes(void *note_buf, unsigned long size_note)
> {
> Elf64_Nhdr *nt;
> size_t index, len = 0;
> int num = 0;
>
> for (index = 0; index< size_note; index += len) {
> nt = note_buf + index;
>
> if(nt->n_type == NT_PRSTATUS) {
> dd->nt_prstatus_percpu[num] = nt;
> num++;
> }
> len = sizeof(Elf64_Nhdr);
> len = roundup(len + nt->n_namesz, 4);
> len = roundup(len + nt->n_descsz, 4);
> }
>
> if (num> 0) {
> pc->flags2 |= ELF_NOTES;
> dd->num_prstatus_notes = num;
> }
> return;
> }
>
> But then map_cpus_to_prstatus_kdump_cmprs() remaps the
> notes without modifying the dd->num_prstatus_notes counter.
>
> So then there's this:
>
> void *
> diskdump_get_prstatus_percpu(int cpu)
> {
> if ((cpu< 0) || (cpu>= dd->num_prstatus_notes))
> return NULL;
>
> return dd->nt_prstatus_percpu[cpu];
> }
>
> So in a case such as your example, where cpu 2 was the only cpu that saved
> its notes, the function above would incorrectly return NULL when called
> with diskdump_get_prstatus_percpu(2).
>
> What am I missing?
No, you are entirely right and I could not condier about x86 parts.
I'm getting the feeling that there might be more affects for any other arches.
I reconsider different approach to fix NT_PRSTATUS up.
Thanks,
Toshi
> Dave
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
>
More information about the Crash-utility
mailing list