[Crash-utility] is_page_ptr vs. x86_64_kvtop
Bruce Korb
bruce.korb at gmail.com
Mon Mar 18 18:19:09 UTC 2013
Hi,
On Mon, Mar 18, 2013 at 8:29 AM, Dave Anderson <anderson at redhat.com> wrote:
> By classification, do you mean which bit in the filtering option
> of makedumpfile?
Exactly.
>> Per your request:
>>
>> > crash> struct page 0xffffea001cdad420
>> > struct struct page {
>> > flags = 0x200000000000000,
[...]
> OK, looks like a page struct (most likely)...
I was already pretty sure. Confirmed.
>> > crash> kmem -p | tail
>>
>> OK, here's mine, along with the closest page numbers:
>>
>> > PAGE PHYSICAL MAPPING INDEX CNT FLAGS
>> > [...]
>> > ffffea64e939b6f0 1cc4b7fff000 0 0 0 0
>> <<fin>>
>
> Wow, that system has physical memory installed at an unusually high
> physical address location, i.e., where 1cc4b7fff000 is up around
> 28 terabytes?
That seems large to me too, by about a factor of 10.
It _is_ a largish system.
> I'd be interested in seeing a dump of "kmem -n". In your case the output
> is probably huge, but the top part would reflect the physical memory layout,
NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES
0 8912880 ffff88087fffb000 ---- ffff88087fffb000
ffff88087fffb980
ffff88087fffc300
ffff88087fffcc80
MEM_MAP START_PADDR START_MAPNR
ffffea0000000380 10000 16
ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
0 DMA 4080 ffffea0000000380 10000 16
1 DMA32 1044480 ffffea0000038000 1000000 4096
2 Normal 7864320 ffffea0003800000 100000000 1048576
3 Movable 0 0 0 0
-------------------------------------------------------------------
NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES
1 8388608 ffff88107fffa040 ---- ffff88107fffa040
ffff88107fffa9c0
ffff88107fffb340
ffff88107fffbcc0
MEM_MAP START_PADDR START_MAPNR
ffffffffffffffff 880000000 8912896
ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
0 DMA 0 0 0 0
1 DMA32 0 0 0 0
2 Normal 8388608 0 880000000 8912896
3 Movable 0 0 0 0
NR SECTION CODED_MEM_MAP MEM_MAP PFN
0 ffff88087fffa000 ffffea0000000000 ffffea0000000000 0
1 ffff88087fffa020 ffffea0000000000 ffffea00001c0000 32768
2 ffff88087fffa040 ffffea0000000000 ffffea0000380000 65536
[...]
130 ffff88107fff9040 ffffea0000000000 ffffea000e380000 4259840
131 ffff88107fff9060 ffffea0000000000 ffffea000e540000 4292608
132096 ffff880838574558 ffff881038105798 ffff8848a8105798 4328521728
132098 ffff880838574598 ffff880837ed2c00 ffff8840a8252c00 4328587264
[...]
237504 ffff8810369d2f40 ffff8810369d2f40 ffff8875af9d2f40 7782531072
237505 ffff8810369d2f60 1a48b64 657ac08b64 7782563840
237506 ffff8810369d2f80 3686dc30 65afbedc30 7782596608
237507 ffff8810369d2fa0 ffff881033219740 ffff8875ac759740 7782629376
kmem: page excluded: kernel virtual address: ffff8810369d3000 type:
"memory section"
> So your target page structure should "fit" into one of the
> sections above, where the starting MEM_MAP address of each
> section should have a contiguous array of page structs that
> reference the array of physical pages starting at the "PFN"
> value. Those MEM_MAP addresses are typically increasing in
> value with each section, but I believe that I have seen cases
> where they are not. And they shouldn't have to be, each section
> has a base vmemmap address for some number of PFN/physical-pages.
OK. That's a bit confusing for me.
> Anyway, it does looks like a page structure, and the page structure pointer
> itself is translatable. The problem at hand is that the physical address
> that the page structure refers to is not being determined because the page
> structure address itself is not being recognized by is_page_ptr() as being
> part of the sparsemem infrastructure. The "if IS_SPARSEMEM()" section at
> the top of is_page_ptr() is returning FALSE.
>
> That being said, from your target page structure address and the "kmem -n"
> output, you could presumably calculate the associated physical address.
If the kmem -n output didn't seem to skip over the address of interest....
>> The memory in question is probably not in the dump, but I don't know how
>> to specify that it be added to the dump without knowing how the memory
>> is characterized.
>
> The actual physical page that is referenced by your target page structure
> is in the dumpfile should not affect the is_page_ptr() function. That
> should work regardless.
I think it is a good guess that the data I really want are not in the dump:
# strings cdump-0c0s6n3 |grep -E 'Process (entered|leaving)'
# strings cdump-0c2s6n3 |grep -E 'Process (entered|leaving)'
# strings ../mrp752/sp1-fulldbg/dk.data | \
# strings ../mrp752/sp1-fulldbg/dump.c0-0c1s0n0 | \
> grep -E 'Process (entered|leaving)'|sort |uniq -c
311804 Process entered
1 Process enteredgot mutex:
2 Process enteredpage@
129991 Process leaving
[...]
The "cdump-0c0s6n3" and "cdump-0c2s6n3" files are from the release at issue,
and the ../mrp752/sp1-fulldbg/dump.c0-0c1s0n0 dump is from the SLES-11 SP1
release. As you can see, there should be many thousands of matching strings
in the dump files. Since there is not, ...
So: what physical pages are missing and why are the missing?
With those two questions resolved, we can fix the dump specification
to include the missing pages.
Thank you again. - Bruce
More information about the Crash-utility
mailing list