[Crash-utility] Handle the NT_PRSTATUS lost for the "bt" command

Toshikazu Nakayama nakayama.ts at ncos.nec.co.jp
Wed Jun 20 00:31:36 UTC 2012


(2012/06/20 1:20), Dave Anderson wrote:

> OK, now I'm getting confused...
> 
> I note that currently __diskdump_memory_dump() does this:
> 
>          for (i = 0; i<  dd->num_prstatus_notes; i++) {
>                  fprintf(fp, "            notes[%d]: %lx\n",
>                          i, (ulong)dd->nt_prstatus_percpu[i]);
>          }
> 
> But your patch does this:
> 
>          nrcpus = kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS;
> 
>          for (i = 0; i<  nrcpus; i++) {
>                  if (dd->nt_prstatus_percpu[i] == NULL) {
>                          fprintf(fp,
>                          "            notes[%d]: (not saved)\n",
>                                  i);
>                          continue;
>                  }
>                  fprintf(fp, "            notes[%d]: %lx\n",
>                          i, (ulong)dd->nt_prstatus_percpu[i]);
>          }
> 
> Now surely we don't want to dump "NR_CPUS" notes pointers,
> unless there actually are that many cpus in the system.
> For example, in RHEL6 sets CONFIG_NR_CPUS to 4096 for x86_64,
> but rarely do we see a dumpfile with that many cpus.
> So I would prefer that the for loop continue to be bounded
> by "dd->num_prstatus_notes".

I also bothered about condition whether loop should break
at dd->nt_prstatus_percpu or not, even though remaining online cpu's
notes can't display as "not saved".

However, I choised nrcpus way since my environment could break
with online cpu nums, not NR_CPUS.
Now, I check kernel_NR_CPUS validation.
   if (STREQ(ln, "CONFIG_NR_CPUS")) {
         kt->kernel_NR_CPUS = atoi(val);
I'm aware if ikconfig is not valid, break condition of loop is always
NR_CPUS without doubt and display gets noisy like as your counsel.

> However, given that process_elf32_notes() and process_elf64_notes()
> do a cursory test of each note by checking for the NT_PRSTATUS
> type, I'm presuming that in your dumpfile, dd->num_prstatus_notes
> must be equal to 1, correct?

Yes, dd->num_prstatus_notes in my dumpfile is 1.

>    void
>    process_elf64_notes(void *note_buf, unsigned long size_note)
>    {
>            Elf64_Nhdr *nt;
>            size_t index, len = 0;
>            int num = 0;
> 
>            for (index = 0; index<  size_note; index += len) {
>                    nt = note_buf + index;
> 
>                    if(nt->n_type == NT_PRSTATUS) {
>                            dd->nt_prstatus_percpu[num] = nt;
>                            num++;
>                    }
>                    len = sizeof(Elf64_Nhdr);
>                    len = roundup(len + nt->n_namesz, 4);
>                    len = roundup(len + nt->n_descsz, 4);
>            }
> 
>            if (num>  0) {
>                    pc->flags2 |= ELF_NOTES;
>                    dd->num_prstatus_notes = num;
>            }
>            return;
>    }
> 
> But then map_cpus_to_prstatus_kdump_cmprs() remaps the
> notes without modifying the dd->num_prstatus_notes counter.
> 
> So then there's this:
> 
>    void *
>    diskdump_get_prstatus_percpu(int cpu)
>    {
>            if ((cpu<  0) || (cpu>= dd->num_prstatus_notes))
>                    return NULL;
> 
>            return dd->nt_prstatus_percpu[cpu];
>    }
> 
> So in a case such as your example, where cpu 2 was the only cpu that saved
> its notes, the function above would incorrectly return NULL when called
> with diskdump_get_prstatus_percpu(2).
> 
> What am I missing?

No, you are entirely right and I could not condier about x86 parts.
I'm getting the feeling that there might be more affects for any other arches.
I reconsider different approach to fix NT_PRSTATUS up.

Thanks,
Toshi

> Dave
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 




More information about the Crash-utility mailing list