[Crash-utility] is_page_ptr vs. x86_64_kvtop

Mon Mar 18 19:19:03 UTC 2013

----- Original Message -----
> Hi,
> 
> On Mon, Mar 18, 2013 at 8:29 AM, Dave Anderson <anderson at redhat.com>
> wrote:
> > By classification, do you mean which bit in the filtering option
> > of makedumpfile?
> 
> Exactly.
> 
> >> Per your request:
> >>
> >> > crash> struct page 0xffffea001cdad420
> >> > struct struct page {
> >> >   flags = 0x200000000000000,
> [...]
> > OK, looks like a page struct (most likely)...
> I was already pretty sure.  Confirmed.
> 
> >> >  crash> kmem -p | tail
> >>
> >> OK, here's mine, along with the closest page numbers:
> >>
> >> >       PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
> >> > [...]
> 
> >> > ffffea64e939b6f0 1cc4b7fff000                0        0  0 0
> >> <<fin>>
> >
> > Wow, that system has physical memory installed at an unusually high
> > physical address location, i.e., where 1cc4b7fff000 is up around
> > 28 terabytes?
> 
> That seems large to me too, by about a factor of 10.
> It _is_ a largish system.

What does the initial system banner (or the "sys" command) show?

> 
> > I'd be interested in seeing a dump of "kmem -n".  In your case the output
> > is probably huge, but the top part would reflect the physical
> > memory layout,
> 
> NODE    SIZE      PGLIST_DATA       BOOTMEM_DATA       NODE_ZONES
>   0   8912880   ffff88087fffb000        ----        ffff88087fffb000
>                                                     ffff88087fffb980
>                                                     ffff88087fffc300
>                                                     ffff88087fffcc80
>     MEM_MAP          START_PADDR    START_MAPNR
> ffffea0000000380        10000            16
> 
> ZONE  NAME         SIZE       MEM_MAP      START_PADDR  START_MAPNR
>   0   DMA          4080  ffffea0000000380        10000           16
>   1   DMA32     1044480  ffffea0000038000      1000000         4096
>   2   Normal    7864320  ffffea0003800000    100000000      1048576
>   3   Movable         0                 0            0            0
> 
> -------------------------------------------------------------------
> 
> NODE    SIZE      PGLIST_DATA       BOOTMEM_DATA       NODE_ZONES
>   1   8388608   ffff88107fffa040        ----        ffff88107fffa040
>                                                     ffff88107fffa9c0
>                                                     ffff88107fffb340
>                                                     ffff88107fffbcc0
>     MEM_MAP          START_PADDR    START_MAPNR
> ffffffffffffffff      880000000       8912896
> 
> ZONE  NAME         SIZE       MEM_MAP      START_PADDR  START_MAPNR
>   0   DMA             0                 0            0            0
>   1   DMA32           0                 0            0            0
>   2   Normal    8388608                 0    880000000      8912896
>   3   Movable         0                 0            0            0

At first I didnn't understand how there could be a MEM_MAP of "0" for 
the NODE 1 physical memory section starting at 34GB (880000000).  It 
indicates that there are 8388608 pages (32GB) starting at 880000000.
So the highest physical address would be 0x1080000000 (66GB), which
would be a max_pfn value of 0x1080000000 / 4k, or 17301504 decimal.
But after section 131, the PFN values start at 4328521728 -- which
is 16512GB (~16.5 TB).  So clearly the section data is being misinterpreted,
and because of that phys_to_page() fails to find a MEM_MAP address for
a physical address of 880000000 (i.e. a pfn of 8912896) because section
data skips from a PFN of 429268 to the bizarre 4328521728:

> 
> NR      SECTION        CODED_MEM_MAP        MEM_MAP       PFN
>  0  ffff88087fffa000  ffffea0000000000  ffffea0000000000  0
>  1  ffff88087fffa020  ffffea0000000000  ffffea00001c0000  32768
>  2  ffff88087fffa040  ffffea0000000000  ffffea0000380000  65536
> [...]
> 130  ffff88107fff9040  ffffea0000000000  ffffea000e380000  4259840
> 131  ffff88107fff9060  ffffea0000000000  ffffea000e540000  4292608
> 132096  ffff880838574558  ffff881038105798  ffff8848a8105798 4328521728
> 132098  ffff880838574598  ffff880837ed2c00  ffff8840a8252c00 4328587264
> [...]
> 237504  ffff8810369d2f40  ffff8810369d2f40  ffff8875af9d2f40 7782531072
> 237505  ffff8810369d2f60       1a48b64         657ac08b64 7782563840
> 237506  ffff8810369d2f80      3686dc30         65afbedc30 7782596608
> 237507  ffff8810369d2fa0  ffff881033219740  ffff8875ac759740 7782629376
> kmem: page excluded: kernel virtual address: ffff8810369d3000  type: "memory section"
> 
> > So your target page structure should "fit" into one of the
> > sections above, where the starting MEM_MAP address of each
> > section should have a contiguous array of page structs that
> > reference the array of physical pages starting at the "PFN"
> > value.  Those MEM_MAP addresses are typically increasing in
> > value with each section, but I believe that I have seen cases
> > where they are not.  And they shouldn't have to be, each section
> > has a base vmemmap address for some number of PFN/physical-pages.
> 
> OK.  That's a bit confusing for me.

So again, the output with the full kmem -n display contains
bizarre values after section 131, causing it to go off into
the weeds:

...
127  ffff88087fffafe0  ffffea0000000000  ffffea000de40000  4161536  (ok)
128  ffff88107fff9000  ffffea0000000000  ffffea000e000000  4194304  (ok)
129  ffff88107fff9020  ffffea0000000000  ffffea000e1c0000  4227072  (ok)
130  ffff88107fff9040  ffffea0000000000  ffffea000e380000  4259840  (ok)
131  ffff88107fff9060  ffffea0000000000  ffffea000e540000  4292608  (ok)
132096  ffff880838574558  ffff881038105798  ffff8848a8105798  4328521728  (bogus from here onward...)
132098  ffff880838574598  ffff880837ed2c00  ffff8840a8252c00  4328587264
132099  ffff8808385745b8  ffff880835850400  ffff8840a5d90400  4328620032
132100  ffff8808385745d8  ffff8810342e1c00  ffff8848a49e1c00  4328652800
132101  ffff8808385745f8  ffff8810342e2c00  ffff8848a4ba2c00  4328685568
132102  ffff880838574618  ffff880833a52000  ffff8840a44d2000  4328718336
132103  ffff880838574638  ffff8808354c0c00  ffff8840a6100c00  4328751104
132104  ffff880838574658  ffff8810342e3c00  ffff8848a50e3c00  4328783872
132105  ffff880838574678  ffff8810342e4c00  ffff8848a52a4c00  4328816640
132110  ffff880838574718         20            3871880020     4328980480
132112  ffff880838574758  ffff881037fa3718  ffff8848a9ba3718  4329046016
132114  ffff880838574798  ffff880833a13c00  ffff8840a5993c00  4329111552
132115  ffff8808385747b8  ffff8808386a0800  ffff8840aa7e0800  4329144320
...

So clearly crash is mishandling the memory setup being presented to it.
But I have *no* idea what the problem is.

> 
> > Anyway, it does looks like a page structure, and the page structure pointer
> > itself is translatable.  The problem at hand is that the physical address
> > that the page structure refers to is not being determined because the page
> > structure address itself is not being recognized by is_page_ptr() as being
> > part of the sparsemem infrastructure.  The "if IS_SPARSEMEM()" section at
> > the top of is_page_ptr() is returning FALSE.
> >
> > That being said, from your target page structure address and the "kmem -n"
> > output, you could presumably calculate the associated physical address.
> 
> If the kmem -n output didn't seem to skip over the address of
> interest....

Right, it would walk through all of the sections from obviously misinterpreted
section data above, and would not find your target page.  After section 131, the
MEM_MAP addresses shown are not even in the vmemmap virtual range, which
starts at ffffea0000000000.

> 
> >> The memory in question is probably not in the dump, but I don't know how
> >> to specify that it be added to the dump without knowing how the memory
> >> is characterized.
> >
> > The actual physical page that is referenced by your target page structure
> > is in the dumpfile should not affect the is_page_ptr() function.   That
> > should work regardless.
> 
> I think it is a good guess that the data I really want are not in the dump:
> 
> # strings cdump-0c0s6n3 |grep -E 'Process (entered|leaving)'
> # strings cdump-0c2s6n3 |grep -E 'Process (entered|leaving)'
> # strings ../mrp752/sp1-fulldbg/dk.data | \
> # strings ../mrp752/sp1-fulldbg/dump.c0-0c1s0n0 | \
> > grep -E 'Process (entered|leaving)'|sort |uniq -c
>  311804 Process entered
>       1 Process enteredgot mutex:
>       2 Process enteredpage@
>  129991 Process leaving
> [...]
> 
> The "cdump-0c0s6n3" and "cdump-0c2s6n3" files are from the release at issue,
> and the ../mrp752/sp1-fulldbg/dump.c0-0c1s0n0 dump is from the SLES-11 SP1
> release.  As you can see, there should be many thousands of matching strings
> in the dump files.  Since there is not, ...
> 
> So:  what physical pages are missing and why are the missing?
> With those two questions resolved, we can fix the dump specification
> to include the missing pages.

I don't know how SUSE sets up their dumping operation.  I presume that they
use makedumpfile to post-process/filter /proc/vmcore into the dumpfile, and 
therefore you would need to find out how it got invoked.  On RHEL systems, 
there is an /etc/kdump.conf file which specifies a "core_collector", and if
it specifies "makedumpfile", it also shows the exact command line used to
invoke it when running against /proc/vmcore in the second kernel.  

For example, by default we use:

 core_collector makedumpfile -c --message-level 1 -d 31

and where the makedumpfile(8) (or "makedumpfile --help") will indicate
which types of memory will be filtered based upon the "-d <dump_level>"
argument.  A dump_level of 31 is the most aggressive:

                dump | zero | cache|cache  | user | free
               level | page | page |private| data | page
              -------+------+------+-------+------+------
                   0 |      |      |       |      |
                   1 |  X   |      |       |      |
                   2 |      |  X   |       |      |
                   3 |  X   |  X   |       |      |
                   4 |      |  X   |  X    |      |
                   5 |  X   |  X   |  X    |      |
                   6 |      |  X   |  X    |      |
                   7 |  X   |  X   |  X    |      |
                   8 |      |      |       |  X   |
                   9 |  X   |      |       |  X   |
                  10 |      |  X   |       |  X   |
                  11 |  X   |  X   |       |  X   |
                  12 |      |  X   |  X    |  X   |
                  13 |  X   |  X   |  X    |  X   |
                  14 |      |  X   |  X    |  X   |
                  15 |  X   |  X   |  X    |  X   |
                  16 |      |      |       |      |  X
                  17 |  X   |      |       |      |  X
                  18 |      |  X   |       |      |  X
                  19 |  X   |  X   |       |      |  X
                  20 |      |  X   |  X    |      |  X
                  21 |  X   |  X   |  X    |      |  X
                  22 |      |  X   |  X    |      |  X
                  23 |  X   |  X   |  X    |      |  X
                  24 |      |      |       |  X   |  X
                  25 |  X   |      |       |  X   |  X
                  26 |      |  X   |       |  X   |  X
                  27 |  X   |  X   |       |  X   |  X
                  28 |      |  X   |  X    |  X   |  X
                  29 |  X   |  X   |  X    |  X   |  X
                  30 |      |  X   |  X    |  X   |  X
                  31 |  X   |  X   |  X    |  X   |  X

You might want to just filter zero-filled-pages and free-pages,
which would be a dump-level of 17.

Dave