[Crash-utility] x86_64 limit of 454 cpu's?

Dave Anderson anderson at redhat.com
Fri Apr 15 16:13:51 UTC 2011



----- Original Message -----
> (4/15/2011 09:04), Dave Anderson wrote:
> >
> >
> > ----- Original Message -----
> >> Hi Dave, and company,
> >>
> >> I get this error trying to open a dump of a large system:
> >>
> >> crash: compressed kdump: invalid nr_cpus value: 640
> >> crash: vmcore: not a supported file format
> >>
> >> The message is from diskdump.c:
> >> if (sizeof(*header) + sizeof(void *) * header->nr_cpus> block_size
> >> ||
> >>      header->nr_cpus<= 0) {
> >>          error(INFO, "%s: invalid nr_cpus value: %d\n",
> >>
> >> block_size is the page size of 4096
> >> struct disk_dump_header looks like 464 bytes
> >> void * is 8
> >> So it looks like 454 is the maximum number of cpus.
> >> 464 + 454*8 -> 4096
> >>
> >> Is this intentional?
> >> It looks like a restriction that no one ever complained about. But
> >> there
> >> are systems (Altix UV) with 2048 cpu's.
> >>
> >> Is there an easy fix?
> >>
> >> -Cliff
> >
> > To be honest, I don't know, I didn't design or write that code.
> 
> Yes, this is intentional for RHEL4/diskdump. In the RHEL4 kernel,
> disk_dump_header is defined as follows.
> 
> struct disk_dump_header {
> char signature[8]; /* = "DISKDUMP" */
> (snip)
> int nr_cpus; /* Number of CPUs */
> struct task_struct *tasks[NR_CPUS];
> };
> 
> And maximum logical CPUs of RHEL4 are 32(x86) or 64(x86_64) so
> this does not cause any problem.
> 
> On the other hands, as you and Dave said, this causes limitation
> problem
> in RHEL5/RHEL6 kernel. But as far as I know, makedumpfile does not use
> this "tasks" member, so we can skip here.
> 
> if (is_diskdump?)
> if (sizeof(*header) + sizeof(void *) * header->nr_cpus> block_size ||
> header->nr_cpus<= 0) {
> error(INFO, "%s: invalid nr_cpus value: %d\n",
> goto err;
> }
> 
> Something like this. Dave, Ohmichi-san, what do you think?
> 
> Thanks,
> Takao Indoh

Looking at a couple sample compressed kdumps, it does appear that
they do not set up the tasks[] array, and that the sub_header still
starts at the "header->block_size".  That being the case, your
proposal looks like a nice, simple fix!

Cliff, can you enclose that piece of code with something like:

        if (DISKDUMP_VALID()) {
            if (sizeof(*header) + sizeof(void *) * header->nr_cpus > block_size ||
                header->nr_cpus <= 0) {
                    error(INFO, "%s: invalid nr_cpus value: %d\n",
                            DISKDUMP_VALID() ? "diskdump" : "compressed kdump",
                            header->nr_cpus);
                    goto err;
            }
        }

I suppose it should still check for (header->nr_cpus <= 0), but I'll
let Ken'ichi confirm that.

Thanks Takao,
  Dave

> 
> >
> > And you're right, although dumpfiles with that many cpus are highly
> > unusual, but looking at the code, it certainly does appear that the
> > disk_dump_header plus the task pointers for each cpu must fit in a
> > "block_size", or page size, and that the sub_header is the first
> > item
> > following the contents of the first page:
> >
> > ---
> >   ^ disk_dump_header
> >   |     task on cpu 0
> > page ...
> >   |     task on cpu x-1
> >   V task on cpu x
> > ---
> >         sub_header
> >         bitmap
> >         remaining stuff...
> >
> > Since your dump is presumably a compressed kdump, I'm wondering
> > what the makedumpfile code did in your case? Did it round up the
> > location of the sub_header to a page boundary?
> >
> > I've cc'd this to Ken'ichi Ohmichi (makedumpfile), and to Takao
> > Indoh
> > (original compressed diskdump) for their input.
> >
> > Thanks,
> >    Dave
> >
> >




More information about the Crash-utility mailing list