[Crash-utility] [PATCH] Speed up "kmem -[sS]" by optimizing is_page_ptr()

Dave Anderson anderson at redhat.com
Fri Feb 23 17:20:08 UTC 2018





 ----- Original Message -----
> Hi Dave,
> 
> On 2/21/2018 4:14 PM, Dave Anderson wrote:
> > 
> > 
> > ----- Original Message -----
> >> Hi Dave,
> >>
> >> Thank you so much for your review!
> >>
> >> On 2/21/2018 11:41 AM, Dave Anderson wrote:
> >>>
> >>> Hi Kasuhito,
> >>>
> >>> Just as a follow-up review of this part of your original patch:
> >>>
> >>>   +#ifdef VMEMMAP_VADDR
> >>>   +               nr = nr_mem_sections;
> >>>   +               if (machdep->flags & VMEMMAP)
> >>>   +                       nr = pfn_to_section_nr((addr - VMEMMAP_VADDR) /
> >>>   SIZE(page));
> >>>   +               else if (readmem(addr + OFFSET(page_flags), KVADDR,
> >>>   &flags,
> >>>   +                       sizeof(ulong), "page.flags",
> >>>   RETURN_ON_ERROR|QUIET))
> >>>   +                       nr = (flags >> (SIZE(page_flags)*8 -
> >>>   SECTIONS_SHIFT())
> >>>   +                               & ((1UL << SECTIONS_SHIFT()) - 1));
> >>>   +
> >>>   +               if (nr < nr_mem_sections) {
> >>>   +#else
> >>>                   for (nr = 0; nr < nr_mem_sections ; nr++) {
> >>>   +#endif
> >>>   
> >>> One of my original concerns was associated with the
> >>> backwards-compatiblity
> >>> of the non-VMEMMAP page->flags usage, primarily because it has changed
> >>> over the years.  Perhaps the SECTIONS_SHIFT part has remained the same,
> >>> but depending upon its future stability in this function still worries
> >>> me.
> >>
> >> Yes, I understand the concern.
> >>
> >> The SECTIONS_SHIFT part is the same as the function page_to_section() in
> >> include/linux/mm.h.  I thought that if the values related to
> >> SECTIONS_SHIFT
> >> in kernel change, the crash utility will have to follow it regardless of
> >> this patch, because they are used commonly in crash.  But possible changes
> >> could not be limited to such values.
> > 
> > It's true that crash should follow the upstream values, but in the past
> > there have been periods of times when the MAX_PHYSMEM_BITS and
> > SECTION_SIZE_BITS
> > for different architectures changed upstream, but were not immediately
> > updated in
> > the crash utility.  And that was because crash still worked OK because most
> > systems did not have large enough memory for the changes to cause things to
> > fail.
> 
> Thank you, I understood it more precisely.
> 
> [...]
> > So this goes to the question as to whether is_page_ptr() should return TRUE
> > if the incoming page address is accessible(), but it references a physical
> > address
> > that does not exist.  With the current code, it verifies that it's a page
> > address,
> > and that the page address points to an actual physical memory page.  I
> > suggested
> > using accessible() on the page structure address, but that would not
> > necessarily
> > be accurate because theoretically a vmemmap'd/instantiated page of page
> > structures
> > could contain page structure addresses that do not reference physical
> > memory.
> > (e.g., if a hole started at a page address that's in the  middle of a
> > vmemmap'd
> > page of page structures)
> 
> As to the last example (IIUC), I had confirmed that accessible() returned
> FALSE if a page address was in a hole like below on x86_64/RHEL7, so I was
> writing the prototype patch.
> 
>   crash> kmem -n
>   ...
>   NR      SECTION        CODED_MEM_MAP        MEM_MAP       PFN
>   ...
>   23  ffff88043ffd82e0  ffffea0000000000  ffffea0002e00000  753664  ## There
>   is a
>   32  ffff88043ffd8400  ffffea0000000000  ffffea0004000000  1048576 ## hole
>   here.
>   ...
>   crash> kmem ffffea0002ffffc0  ## The last page struct of NR=23
>   DEBUG: addr=0xffffea0002ffffc0 nr=23 accessible=1
>   DEBUG: addr=0xffffea0002ffffc0 nr=23 accessible=1  ## in is_page_ptr()
>   DEBUG: addr=0xffffea0002ffffc0 nr=23 accessible=1
>   DEBUG: addr=0xffffea0002ffffc0 nr=23 accessible=1
>         PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
>   ffffea0002ffffc0  bffff000                0        0  1 1fffff00000000
>   crash> kmem ffffea0003000000  ## A page address in a hole.
>   kmem: WARNING: cannot make virtual-to-physical translation:
>   ffffea0003000000
>   DEBUG: addr=0xffffea0003000000 nr=24 accessible=0
>   DEBUG: addr=0xffffea0003000000 nr=24 accessible=0  ## returned 0
>   DEBUG: addr=0xffffea0003000000 nr=24 accessible=0
>   ffffea0003000000: kernel virtual address not found in mem map
> 
> I'm not sure whether there is the case that a page address does not
> reference physical memory except for the above.  But originally,
> since accessible() is readmem(), which actually reads a dump file,
> it could return FALSE also due to some errors quietly, and this could
> leads to a wrong judgement.
> 
> And I had thought that if accessible() was valid for the page validity
> test, there was no need to calculate its section.  So could it remove
> the accessible() and continue to utilize the valid_section_nr() section
> for the test like this?:
> ---
> diff --git a/defs.h b/defs.h
> index aa17792..ab98cb7 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -3335,6 +3335,9 @@ struct arm64_stackframe {
>  #define VTOP(X)               x86_64_VTOP((ulong)(X))
>  #define IS_VMALLOC_ADDR(X)    x86_64_IS_VMALLOC_ADDR((ulong)(X))
>  
> +#define IS_VMEMMAP_PAGE_ADDR(X)  x86_64_IS_VMEMMAP_PAGE_ADDR((ulong)(X))
> +#define VMEMMAP_PAGE_TO_PFN(X)   (((ulong)(X) - VMEMMAP_VADDR) / SIZE(page))
> +
>  /*
>   * the default page table level for x86_64:
>   *    4 level page tables
> @@ -5646,6 +5649,7 @@ void x86_64_dump_machdep_table(ulong);
>  ulong x86_64_PTOV(ulong);
>  ulong x86_64_VTOP(ulong);
>  int x86_64_IS_VMALLOC_ADDR(ulong);
> +int x86_64_IS_VMEMMAP_PAGE_ADDR(ulong);
>  void x86_64_display_idt_table(void);
>  #define display_idt_table() x86_64_display_idt_table()
>  long x86_64_exception_frame(ulong, ulong, char *, struct bt_info *, FILE *);
> diff --git a/memory.c b/memory.c
> index 0df8ecc..928c3c2 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -13350,6 +13350,30 @@ is_page_ptr(ulong addr, physaddr_t *phys)
>  	physaddr_t section_paddr;
>  
>  	if (IS_SPARSEMEM()) {
> +#ifdef IS_VMEMMAP_PAGE_ADDR
> +		if (machdep->flags & VMEMMAP) {
> +			if (IS_VMEMMAP_PAGE_ADDR(addr)) {
> +				nr = pfn_to_section_nr(VMEMMAP_PAGE_TO_PFN(addr));
> +				if ((sec_addr = valid_section_nr(nr))) {
> +					coded_mem_map = section_mem_map_addr(sec_addr);
> +					mem_map = sparse_decode_mem_map(coded_mem_map, nr);
> +					end_mem_map = mem_map + (PAGES_PER_SECTION() * SIZE(page));
> +
> +					if ((addr >= mem_map) && (addr < end_mem_map)) {
> +						if ((addr - mem_map) % SIZE(page))
> +							return FALSE;
> +						if (phys) {
> +							section_paddr = PTOB(section_nr_to_pfn(nr));
> +							pgnum = (addr - mem_map) / SIZE(page);
> +							*phys = section_paddr + ((physaddr_t)pgnum * PAGESIZE());
> +						}
> +						return TRUE;
> +					}
> +				}
> +			}
> +			return FALSE;
> +		}
> +#endif
>  		nr_mem_sections = NR_MEM_SECTIONS();
>  	        for (nr = 0; nr < nr_mem_sections ; nr++) {
>  	                if ((sec_addr = valid_section_nr(nr))) {
> diff --git a/x86_64.c b/x86_64.c
> index 7449571..7240c5d 100644
> --- a/x86_64.c
> +++ b/x86_64.c
> @@ -1570,6 +1570,14 @@ x86_64_IS_VMALLOC_ADDR(ulong vaddr)
>  		(vaddr >= VSYSCALL_START && vaddr < VSYSCALL_END));
>  }
>  
> +int
> +x86_64_IS_VMEMMAP_PAGE_ADDR(ulong vaddr)
> +{
> +	return ((machdep->flags & VMEMMAP) &&
> +		(vaddr >= VMEMMAP_VADDR && vaddr <= VMEMMAP_END) &&
> +		!((vaddr - VMEMMAP_VADDR) % SIZE(page)));
> +}
> +
>  static int
>  x86_64_is_module_addr(ulong vaddr)
>  {
> ---
> 
> > So if we don't want to change the functionality of is_page_ptr(), then maybe
> > the binary search would be a suitable compromise for both accuracy and speed
> > on extremely large systems?
> 
> I have not considered it enough yet, but if all ranges of mem_sections are
> ascending for section numbers and vmemmap holes like above are to be managed
> well, it might be good.  (I'm guessing that the binary search might need an
> auxiliary array or something due to the vmemmap holes..)
> 
> Thanks,
> Kazuhito Hagio


This "#ifdef IS_VMEMMAP_PAGE_ADDR" patch is getting really ugly.  I've been
playing around with this, and this is my latest counter-proposal.

First, the mem_section numbers are ascending.  They may not necessarily start
with 0, and there may be holes, but they are ascending.  That being the case,
there is no need for is_page_ptr() to walk through NR_MEM_SECTIONS() worth
of entries, because there will be an ending number that's typically much
smaller.  Even on a 256GB dumpfile I have on hand, which has a NR_MEM_SECTIONS()
value of 524288, the largest valid section number is 2055. (What is the smallest
and largest number that you see on whatever large-memory system that you are 
testing with?)

In any case, let's store the largest section number during initialization in 
the vm_table, and use it as a delimeter in is_page_ptr().

diff --git a/defs.h b/defs.h
index aa17792..8768fd5 100644
--- a/defs.h
+++ b/defs.h
@@ -2369,6 +2369,7 @@ struct vm_table {                /* kernel VM-related data */
 		ulong mask;
 		char *name;
 	} *pageflags_data;
+	ulong max_mem_section_nr;
 };
 
 #define NODES                       (0x1)
diff --git a/memory.c b/memory.c
index 0df8ecc..6770937 100644
--- a/memory.c
+++ b/memory.c
@@ -255,7 +255,7 @@ static void PG_reserved_flag_init(void);
 static void PG_slab_flag_init(void);
 static ulong nr_blockdev_pages(void);
 void sparse_mem_init(void);
-void dump_mem_sections(void);
+void dump_mem_sections(int);
 void list_mem_sections(void);
 ulong sparse_decode_mem_map(ulong, ulong);
 char *read_mem_section(ulong);
@@ -13350,7 +13350,7 @@ is_page_ptr(ulong addr, physaddr_t *phys)
 	physaddr_t section_paddr;
 
 	if (IS_SPARSEMEM()) {
-		nr_mem_sections = NR_MEM_SECTIONS();
+		nr_mem_sections = vt->max_mem_section_nr+1;
 	        for (nr = 0; nr < nr_mem_sections ; nr++) {
 	                if ((sec_addr = valid_section_nr(nr))) {
 	                        coded_mem_map = section_mem_map_addr(sec_addr);
@@ -13668,6 +13668,7 @@ dump_vm_table(int verbose)
 	fprintf(fp, "   swap_info_struct: %lx\n", (ulong)vt->swap_info_struct);
 	fprintf(fp, "            mem_sec: %lx\n", (ulong)vt->mem_sec);
 	fprintf(fp, "        mem_section: %lx\n", (ulong)vt->mem_section);
+	fprintf(fp, " max_mem_section_nr: %ld\n", vt->max_mem_section_nr);
 	fprintf(fp, "       ZONE_HIGHMEM: %d\n", vt->ZONE_HIGHMEM);
 	fprintf(fp, "node_online_map_len: %d\n", vt->node_online_map_len);
 	if (vt->node_online_map_len) {
@@ -16295,8 +16296,8 @@ dump_memory_nodes(int initialize)
 		vt->numnodes = n;
 	}
 
-	if (!initialize && IS_SPARSEMEM())
-		dump_mem_sections();
+	if (IS_SPARSEMEM())
+		dump_mem_sections(initialize);
 }
 
 /*
@@ -17128,9 +17129,9 @@ pfn_to_map(ulong pfn)
 }
 
 void 
-dump_mem_sections(void)
+dump_mem_sections(int initialize)
 {
-	ulong nr,addr;
+	ulong nr, max, addr;
 	ulong nr_mem_sections;
 	ulong coded_mem_map, mem_map, pfn;
 	char buf1[BUFSIZE];
@@ -17140,6 +17141,15 @@ dump_mem_sections(void)
 
 	nr_mem_sections = NR_MEM_SECTIONS();
 
+	if (initialize) {
+		for (nr = max = 0; nr < nr_mem_sections ; nr++) {
+			if (valid_section_nr(nr))
+				max = nr;
+		}
+		vt->max_mem_section_nr = max;
+		return;
+	}
+
 	fprintf(fp, "\n");
 	pad_line(fp, BITS32() ? 59 : 67, '-');
         fprintf(fp, "\n\nNR  %s  %s  %s  PFN\n",


Now, with respect to the architecture-specific, VMEMMAP-only, part
that is of most interest to you, let's do it with an architecture
specific callback.  You can post it for x86_64, and the other architecture
maintainers can write their own version.  For example, add a new
callback function to the machdep_table structure, i.e., like this:

  struct machdep_table {
          ulong flags;
          ulong kvbase;
  ...
          void (*get_irq_affinity)(int);
          void (*show_interrupts)(int, ulong *);
+         int is_page_ptr(ulong, physaddr_t *);
  };
  
Write the x86_64_is_page_ptr() function that works for VMEMMAP
kernels, and returns FALSE otherwise.  And add the call to the top 
of is_page_ptr() like this:

  int
  is_page_ptr(ulong addr, physaddr_t *phys)
  {
          int n;
          ulong ppstart, ppend;
          struct node_table *nt;
          ulong pgnum, node_size;
          ulong nr, sec_addr;
          ulong nr_mem_sections;
          ulong coded_mem_map, mem_map, end_mem_map;
          physaddr_t section_paddr;

+	  if (machdep->is_page_ptr(addr, phys))
+		  return TRUE;
  
          if (IS_SPARSEMEM()) {
  ...
  
To avoid having to check whether the machdep->is_page_ptr function
exists, write a generic_is_page_ptr() function that just returns
FALSE, and initialize machdep->is_page_ptr to generic_is_page_ptr in
the setup_environment() function.  Later on, individual architectures
can overwrite it when machdep_init(SETUP_ENV) is called.

How's that sound?

Dave




More information about the Crash-utility mailing list