[Crash-utility] [PATCH] Speed up "kmem -[sS]" by optimizing is_page_ptr()

Dave Anderson anderson at redhat.com
Fri Feb 16 21:18:09 UTC 2018



----- Original Message -----
> Hi Dave,
> 
> On 2/15/2018 12:38 PM, Dave Anderson wrote:
> ...
> >>>> Because of your questions about ppc64, possible backwards-compatibility
> >>>> issues,
> >>>> or potential future changes to page.flags usage, this permanent change
> >>>> to the
> >>>> is_page_ptr() function solely for the purposes of SLUB's count_partial()
> >>>> function
> >>>> makes me nervous.
> >>>
> >>> Sorry, my explanation was not enough. Please let me supplement a little.
> >>> As far as I know, the count_partial() is the function which could receive
> >>> the
> >>> effect of this patch most.  But is_page_ptr() could be called many times
> >>> also
> >>> through the other functions, so this will improve them, too.  Moreover,
> >>> the
> >>> is_page_ptr()'s loop could loop NR_MEM_SECTIONS() times, which is
> >>> 33554432
> >>> on x86_64(5level) if I understand correctly, in the worst case.
> >>>
> >>> So I thought that ideally we should improve the is_page_ptr() itself if
> >>> possible,
> >>> rather than find the callers which could call it many times and change
> >>> them.
> >>> Also, I am willing to drop the definition of VMEMMAP_VADDR for ppc64 this
> >>> time.
> >>
> >> OK, I understand your point.  But what concerns me is that the function's
> >> purpose is to absolutely identify whether the incoming page structure
> >> address
> >> is a correct page structure address.  But if an invalid address gets
> >> passed
> >> into is_page_ptr(), your patch would take the invalid address, calculate
> >> an
> >> invalid "nr", and continue from there, right?
> 
> Yes, if an invalid "nr" is the number where section does not exist,
> valid_section_nr() would return 0.  Even if it is the number where section
> exists by accident, the invalid "addr" is not between mem_map and
> end_mem_map,
> or not page-aligned, because if so, it is a page structure address.
> 
> Also without this patch, when an invalid address comes, the loop could tries
> many invalid "nr"s less than NR_MEM_SECTIONS().
> 
> I hope this answers your concern..
> 
> > 
> > Another suggestion/question -- if is_page_ptr() is called with a NULL phys
> > argument (as is done most of the time),  could it skip the "if
> > IS_SPARSEMEM()"
> > section at the top, and still utilize the part at the bottom, where it
> > walks
> > through the vt->node_table[x] array?  I'm not sure about the "ppend"
> > calculation
> > though -- even if there are holes in the node's address space, is it still
> > a
> > contiguous chunk of page structure addresses per-node?
> 
> I'm still investigating and not sure yet, but I think that SPASEMEM uses
> mem_section instead of node_mem_map means page structures could be
> non-contignuous per-node according to architecture or condition.
> 
>   typedef struct pglist_data {
>   ...
>   #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
>           struct page *node_mem_map;
> 
> I'll continue to check it.

You are right, but in the case where pglist_data.node_mem_map does *not* exist,
the crash utility initializes each vt->node_table[node].mem_map with the node's
starting mem_map address by using the return value from phys_to_page() of the
node's starting physical address -- which uses the sparsemem functions.
 
The question is whether the current "ppend" calculation is correct for the last
physical page in a node.   If it is not correct, then perhaps an "mem_map_end" value
can be added to the node_table structure, initialized by using phys_to_page() to get
the page address of the last physical address in the node.  And then in that case, the 
question is whether the mem_map range of virtual addresses are contiguous -- even if
there are holes in the mem_map virtual address range.

Thanks,
  Dave



> 
> Thanks,
> Kazuhito Hagio
> 
> > 
> >>
> >> Dave
> >>
> >>>
> >>>>
> >>>> There is really no compelling reason that count_partial() absolutely
> >>>> *must* use
> >>>> is_page_ptr(), and so I'm thinking that perhaps you could come up with a
> >>>> less
> >>>> heavy-handed method for simply testing whether a page.lru entry points
> >>>> to
> >>>> another
> >>>> vmemmap'd page.  Something along the lines of adding this for
> >>>> vmemmap-enabled kernels:
> >>>>
> >>>>   #define IN_VMEMMAP_RANGE(page) ((page >= VMEMMAP_VADDR) && (page <=
> >>>>   VMEMMAP_END))
> >>>>
> >>>> and then have count_partial() replace the is_page_ptr() call with
> >>>> another
> >>>> slub function that does something like this for vmemmap-enabled kernels:
> >>>>
> >>>>    (IN_VMMEMAP_RANGE(next) && accessible(next))
> >>>>
> >>>> Or instead of accessible(), it could read "next" as a list_head with
> >>>> RETURN_ON_ERROR,
> >>>> and verify that next->prev points back to the current list_head.
> >>>>
> >>>> Non-vmemmap-enabled kernels could still use is_page_ptr().
> >>>>
> >>>> What do you think of doing something like that?
> >>>
> >>> Given possible compatibility issues you said, I think that the way you
> >>> suggested
> >>> might well be enough for now.  I'll try a method like the above.
> >>>
> >>> Thanks,
> >>> Kazuhito Hagio
> >>>
> >>>>
> >>>> Dave
> >>>>
> >>>>
> >>>>       
> >>>>
> >>>> ----- Original Message -----
> >>>>> Hi,
> >>>>>
> >>>>> The "kmem -[sS]" commands can take several minutes to complete with
> >>>>> the following conditions:
> >>>>>   * The system has a lot of memory sections with CONFIG_SPARSEMEM.
> >>>>>   * The kernel uses SLUB and it has a very long partial slab list.
> >>>>>
> >>>>>   crash> kmem -s dentry | awk '{print strftime("%T"), $0}'
> >>>>>   10:18:34 CACHE            NAME                 OBJSIZE  ALLOCATED
> >>>>>   TOTAL
> >>>>>   SLABS  SSIZE
> >>>>>   10:19:41 ffff88017fc78a00 dentry                   192    9038949
> >>>>>   10045728
> >>>>>   239184     8k
> >>>>>   crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo $SECONDS'
> >>>>>   334
> >>>>>
> >>>>> One of the causes is that is_page_ptr() in count_partial() checks if
> >>>>> a given slub page address is a page struct by searching all memory
> >>>>> sections linearly for the one which includes it.
> >>>>>
> >>>>>         nr_mem_sections = NR_MEM_SECTIONS();
> >>>>>         for (nr = 0; nr < nr_mem_sections ; nr++) {
> >>>>>                 if ((sec_addr = valid_section_nr(nr))) {
> >>>>>                         ...
> >>>>>
> >>>>> With CONFIG_SPARSEMEM{_VMEMMAP}, we can calculate the memory section
> >>>>> which includes a page struct with its page.flags, or its address and
> >>>>> VMEMMAP_VADDR. With this patch doing so, the computation amount can be
> >>>>> significantly reduced in that case.
> >>>>>
> >>>>>   crash> kmem -s dentry | awk '{print strftime("%T"), $0}'
> >>>>>   10:34:55 CACHE            NAME                 OBJSIZE  ALLOCATED
> >>>>>   TOTAL
> >>>>>   SLABS  SSIZE
> >>>>>   10:34:55 ffff88017fc78a00 dentry                   192    9038949
> >>>>>   10045728
> >>>>>   239184     8k
> >>>>>   crash> kmem -S dentry | bash -c 'cat >/dev/null ; echo $SECONDS'
> >>>>>   2
> >>>>>
> >>>>> This patch uses VMEMMAP_VADDR. It is not defined on PPC64, but it looks
> >>>>> like PPC64 supports VMEMMAP flag and machdep->machspec->vmemmap_base is
> >>>>> it, so this patch also defines it for PPC64. This might need some help
> >>>>> from PPC folks.
> >>>>>
> >>>>> Signed-off-by: Kazuhito Hagio <k-hagio at ab.jp.nec.com>
> >>>>> ---
> >>>>>  defs.h   |  2 ++
> >>>>>  memory.c | 15 +++++++++++++++
> >>>>>  2 files changed, 17 insertions(+)
> >>>>>
> >>>>> diff --git a/defs.h b/defs.h
> >>>>> index aa17792..84e68ca 100644
> >>>>> --- a/defs.h
> >>>>> +++ b/defs.h
> >>>>> @@ -3861,6 +3861,8 @@ struct efi_memory_desc_t {
> >>>>>  #define IS_VMALLOC_ADDR(X) machdep->machspec->is_vmaddr(X)
> >>>>>  #define KERNELBASE      machdep->pageoffset
> >>>>>  
> >>>>> +#define VMEMMAP_VADDR   (machdep->machspec->vmemmap_base)
> >>>>> +
> >>>>>  #define PGDIR_SHIFT     (machdep->pageshift + (machdep->pageshift -3)
> >>>>>  +
> >>>>>  (machdep->pageshift - 2))
> >>>>>  #define PMD_SHIFT       (machdep->pageshift + (machdep->pageshift -
> >>>>>  3))
> >>>>>  
> >>>>> diff --git a/memory.c b/memory.c
> >>>>> index 0df8ecc..0696763 100644
> >>>>> --- a/memory.c
> >>>>> +++ b/memory.c
> >>>>> @@ -13348,10 +13348,25 @@ is_page_ptr(ulong addr, physaddr_t *phys)
> >>>>>  	ulong nr_mem_sections;
> >>>>>  	ulong coded_mem_map, mem_map, end_mem_map;
> >>>>>  	physaddr_t section_paddr;
> >>>>> +#ifdef VMEMMAP_VADDR
> >>>>> +	ulong flags;
> >>>>> +#endif
> >>>>>  
> >>>>>  	if (IS_SPARSEMEM()) {
> >>>>>  		nr_mem_sections = NR_MEM_SECTIONS();
> >>>>> +#ifdef VMEMMAP_VADDR
> >>>>> +		nr = nr_mem_sections;
> >>>>> +		if (machdep->flags & VMEMMAP)
> >>>>> +			nr = pfn_to_section_nr((addr - VMEMMAP_VADDR) / SIZE(page));
> >>>>> +		else if (readmem(addr + OFFSET(page_flags), KVADDR, &flags,
> >>>>> +			sizeof(ulong), "page.flags", RETURN_ON_ERROR|QUIET))
> >>>>> +			nr = (flags >> (SIZE(page_flags)*8 - SECTIONS_SHIFT())
> >>>>> +				& ((1UL << SECTIONS_SHIFT()) - 1));
> >>>>> +
> >>>>> +		if (nr < nr_mem_sections) {
> >>>>> +#else
> >>>>>  	        for (nr = 0; nr < nr_mem_sections ; nr++) {
> >>>>> +#endif
> >>>>>  	                if ((sec_addr = valid_section_nr(nr))) {
> >>>>>  	                        coded_mem_map =
> >>>>>  	                        section_mem_map_addr(sec_addr);
> >>>>>  	                        mem_map =
> >>>>>  	                        sparse_decode_mem_map(coded_mem_map,
> >>>>>  	                        nr);
> >>>>> --
> >>>>> 1.8.3.1
> >>>>>
> >>>>> --
> >>>>> Crash-utility mailing list
> >>>>> Crash-utility at redhat.com
> >>>>> https://www.redhat.com/mailman/listinfo/crash-utility
> >>>>>
> >>>>
> >>>> --
> >>>> Crash-utility mailing list
> >>>> Crash-utility at redhat.com
> >>>> https://www.redhat.com/mailman/listinfo/crash-utility
> >>>>
> >>>
> >>> --
> >>> Crash-utility mailing list
> >>> Crash-utility at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/crash-utility
> >>>
> >>
> >> --
> >> Crash-utility mailing list
> >> Crash-utility at redhat.com
> >> https://www.redhat.com/mailman/listinfo/crash-utility
> >>
> > 
> > --
> > Crash-utility mailing list
> > Crash-utility at redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
> > 
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 




More information about the Crash-utility mailing list