[Crash-utility] [PATCH] Speed up "kmem -[sS]" by optimizing is_page_ptr()
Dave Anderson
anderson at redhat.com
Tue Feb 20 21:29:32 UTC 2018
----- Original Message -----
> Hi Dave,
>
> On 2/20/2018 11:32 AM, Dave Anderson wrote:
> ...
> >>>>> Another suggestion/question -- if is_page_ptr() is called with a NULL
> >>>>> phys
> >>>>> argument (as is done most of the time), could it skip the "if
> >>>>> IS_SPARSEMEM()"
> >>>>> section at the top, and still utilize the part at the bottom, where it
> >>>>> walks
> >>>>> through the vt->node_table[x] array? I'm not sure about the "ppend"
> >>>>> calculation
> >>>>> though -- even if there are holes in the node's address space, is it
> >>>>> still
> >>>>> a
> >>>>> contiguous chunk of page structure addresses per-node?
> >>>>
> >>>> I'm still investigating and not sure yet, but I think that SPASEMEM uses
> >>>> mem_section instead of node_mem_map means page structures could be
> >>>> non-contignuous per-node according to architecture or condition.
> >>>>
> >>>> typedef struct pglist_data {
> >>>> ...
> >>>> #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
> >>>> struct page *node_mem_map;
> >>>>
> >>>> I'll continue to check it.
> >>>
> >>> You are right, but in the case where pglist_data.node_mem_map does *not*
> >>> exist,
> >>> the crash utility initializes each vt->node_table[node].mem_map with the
> >>> node's
> >>> starting mem_map address by using the return value from phys_to_page() of
> >>> the
> >>> node's starting physical address -- which uses the sparsemem functions.
> >>>
> >>> The question is whether the current "ppend" calculation is correct for
> >>> the
> >>> last
> >>> physical page in a node. If it is not correct, then perhaps an
> >>> "mem_map_end" value
> >>> can be added to the node_table structure, initialized by using
> >>> phys_to_page() to get
> >>> the page address of the last physical address in the node. And then in
> >>> that case, the
> >>> question is whether the mem_map range of virtual addresses are contiguous
> >>> -- even if
> >>> there are holes in the mem_map virtual address range.
> >>
> >> "node_size" is set to pglist_data.node_spanned_pages, which includes
> >> holes.
> >> So I think that if VMEMMAP, which a page address is linear against its
> >> pfn,
> >> the current "ppend" calculation is correct for the last page in a node.
> >> But if not VMEMMAP, since there is no guarantee of the linearity, the
> >> calculation could be incorrect.
> >>
> >> I found an example with RHEL5:
> >>
> >> crash> help -o
> >> ...
> >> size_table:
> >> page: 56
> >> ...
> >> crash> kmem -n
> >> NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES
> >> 0 524279 ffff810000014000 ffffffff804e1900 ffff810000014000
> >> ffff810000014b00
> >> ffff810000015600
> >> ffff810000016100
> >> MEM_MAP START_PADDR START_MAPNR
> >> ffff8100007da000 0 0
> >>
> >> ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
> >> 0 DMA 4096 ffff8100007da000 0 0
> >> 1 DMA32 520183 ffff810000812000 1000000 4096
> >> 2 Normal 0 0 0 0
> >> 3 HighMem 0 0 0 0
> >>
> >> -------------------------------------------------------------------
> >>
> >> NR SECTION CODED_MEM_MAP MEM_MAP PFN
> >> 0 ffff810009000000 ffff8100007da000 ffff8100007da000 0
> >> 1 ffff810009000008 ffff8100007da000 ffff81000099a000 32768
> >> 2 ffff810009000010 ffff8100007da000 ffff810000b5a000 65536
> >> 3 ffff810009000018 ffff8100007da000 ffff810000d1a000 98304 <= there
> >> is a
> >> 4 ffff810009000020 ffff810008901000 ffff810009001000 131072 <=
> >> mem_map gap.
> >> 5 ffff810009000028 ffff810008901000 ffff8100091c1000 163840
> >> :
> >> 14 ffff810009000070 ffff810008901000 ffff81000a181000 458752
> >> 15 ffff810009000078 ffff810008901000 ffff81000a341000 491520
> >> crash>
> >>
> >> In this case, the "ppend" will be
> >>
> >> 0xffff8100007da000 + (524279 * 56)
> >> = 0xffff8100023d9e08
> >>
> >> but it looks like the actual value is around 0xffff81000a501000.
> >
> > Right, I understand that the current "ppend" calculation wouldn't work.
> >
> >> And also, we can see the gap between NR=3 and 4. This means that if the
> >> correct "mem_map_end" is added to the node_table structure, it would be
> >> not enough to check whether an address is a page structure.
> >
> > Why? Wouldn't it still give us an ascending range of page structure
> > addresses
> > on a per-node basis? (even if there was a physical and/or virtual memory
> > hole?)
> > AFAICT, for each section NR, the MEM_MAP and PFN values always increment.
>
> Sorry if I misunderstood something..
> First, I assume that we are talking about the case of kernels with SPARSEMEM
> and using the vm->numnodes loop after skipping the IS_SPARSEMEM() section.
>
> The "mem_map_end" I mean here is the page address of the last physical
> address in the node, and the example system has only one node. So I think
> that the "kmem -n" output above suggests that it could return TRUE for an
> incoming "addr" between the end of NR=3 and the start of NR=4, but it's
> not a page address.
>
> NR MEM_MAP
> 0 +---------+ ffff8100007da000 = nt->mem_map
> : | pages.. | :
> 2 +---------+ ffff810000b5a000
> 3 +---------+ ffff810000d1a000
> +---------+ ffff810000eda000 = ffff810000d1a000 + (32768 * 56)
> | ??? | <-- for an "addr" here, it could returns TRUE.
> 4 +---------+ ffff810009001000
> 5 +---------+ ffff8100091c1000
> : | pages.. | :
> 15 +---------+ ffff81000a341000
> +---------+ ffff81000a501000 = nt->mem_map_end
>
> Because of such mem_map holes in a node, I don't think that the vm->numnodes
> loop could be utilized for kernels with SPARSEMEM as it is.
> Is this "mem_map_end" different from the one you assumed?
No.
I understand that a page address in the "???" section above would return
true (unless a "phys" argument was passed in). Checking whether an incoming
address was between nt->mem_map and nt->mem_map_end would be slightly more
refined as compared to adding a new simple function that would check whether
the incoming address was between VMEMMAP_VADDR and VMEMMAP_END, which we
discussed earlier.
So I'm suggesting that a vmemmap page address could be checked for validity by:
(1) verifying that the incoming address is located in the vmemmap address range, and
(2) it is accessible()
Dave
>
> Thanks,
> Kazuhito Hagio
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
>
More information about the Crash-utility
mailing list