[Crash-utility] [PATCH] Speed up "kmem -[sS]" by optimizing is_page_ptr()
Dave Anderson
anderson at redhat.com
Fri Feb 23 17:20:08 UTC 2018
----- Original Message -----
> Hi Dave,
>
> On 2/21/2018 4:14 PM, Dave Anderson wrote:
> >
> >
> > ----- Original Message -----
> >> Hi Dave,
> >>
> >> Thank you so much for your review!
> >>
> >> On 2/21/2018 11:41 AM, Dave Anderson wrote:
> >>>
> >>> Hi Kasuhito,
> >>>
> >>> Just as a follow-up review of this part of your original patch:
> >>>
> >>> +#ifdef VMEMMAP_VADDR
> >>> + nr = nr_mem_sections;
> >>> + if (machdep->flags & VMEMMAP)
> >>> + nr = pfn_to_section_nr((addr - VMEMMAP_VADDR) /
> >>> SIZE(page));
> >>> + else if (readmem(addr + OFFSET(page_flags), KVADDR,
> >>> &flags,
> >>> + sizeof(ulong), "page.flags",
> >>> RETURN_ON_ERROR|QUIET))
> >>> + nr = (flags >> (SIZE(page_flags)*8 -
> >>> SECTIONS_SHIFT())
> >>> + & ((1UL << SECTIONS_SHIFT()) - 1));
> >>> +
> >>> + if (nr < nr_mem_sections) {
> >>> +#else
> >>> for (nr = 0; nr < nr_mem_sections ; nr++) {
> >>> +#endif
> >>>
> >>> One of my original concerns was associated with the
> >>> backwards-compatiblity
> >>> of the non-VMEMMAP page->flags usage, primarily because it has changed
> >>> over the years. Perhaps the SECTIONS_SHIFT part has remained the same,
> >>> but depending upon its future stability in this function still worries
> >>> me.
> >>
> >> Yes, I understand the concern.
> >>
> >> The SECTIONS_SHIFT part is the same as the function page_to_section() in
> >> include/linux/mm.h. I thought that if the values related to
> >> SECTIONS_SHIFT
> >> in kernel change, the crash utility will have to follow it regardless of
> >> this patch, because they are used commonly in crash. But possible changes
> >> could not be limited to such values.
> >
> > It's true that crash should follow the upstream values, but in the past
> > there have been periods of times when the MAX_PHYSMEM_BITS and
> > SECTION_SIZE_BITS
> > for different architectures changed upstream, but were not immediately
> > updated in
> > the crash utility. And that was because crash still worked OK because most
> > systems did not have large enough memory for the changes to cause things to
> > fail.
>
> Thank you, I understood it more precisely.
>
> [...]
> > So this goes to the question as to whether is_page_ptr() should return TRUE
> > if the incoming page address is accessible(), but it references a physical
> > address
> > that does not exist. With the current code, it verifies that it's a page
> > address,
> > and that the page address points to an actual physical memory page. I
> > suggested
> > using accessible() on the page structure address, but that would not
> > necessarily
> > be accurate because theoretically a vmemmap'd/instantiated page of page
> > structures
> > could contain page structure addresses that do not reference physical
> > memory.
> > (e.g., if a hole started at a page address that's in the middle of a
> > vmemmap'd
> > page of page structures)
>
> As to the last example (IIUC), I had confirmed that accessible() returned
> FALSE if a page address was in a hole like below on x86_64/RHEL7, so I was
> writing the prototype patch.
>
> crash> kmem -n
> ...
> NR SECTION CODED_MEM_MAP MEM_MAP PFN
> ...
> 23 ffff88043ffd82e0 ffffea0000000000 ffffea0002e00000 753664 ## There
> is a
> 32 ffff88043ffd8400 ffffea0000000000 ffffea0004000000 1048576 ## hole
> here.
> ...
> crash> kmem ffffea0002ffffc0 ## The last page struct of NR=23
> DEBUG: addr=0xffffea0002ffffc0 nr=23 accessible=1
> DEBUG: addr=0xffffea0002ffffc0 nr=23 accessible=1 ## in is_page_ptr()
> DEBUG: addr=0xffffea0002ffffc0 nr=23 accessible=1
> DEBUG: addr=0xffffea0002ffffc0 nr=23 accessible=1
> PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> ffffea0002ffffc0 bffff000 0 0 1 1fffff00000000
> crash> kmem ffffea0003000000 ## A page address in a hole.
> kmem: WARNING: cannot make virtual-to-physical translation:
> ffffea0003000000
> DEBUG: addr=0xffffea0003000000 nr=24 accessible=0
> DEBUG: addr=0xffffea0003000000 nr=24 accessible=0 ## returned 0
> DEBUG: addr=0xffffea0003000000 nr=24 accessible=0
> ffffea0003000000: kernel virtual address not found in mem map
>
> I'm not sure whether there is the case that a page address does not
> reference physical memory except for the above. But originally,
> since accessible() is readmem(), which actually reads a dump file,
> it could return FALSE also due to some errors quietly, and this could
> leads to a wrong judgement.
>
> And I had thought that if accessible() was valid for the page validity
> test, there was no need to calculate its section. So could it remove
> the accessible() and continue to utilize the valid_section_nr() section
> for the test like this?:
> ---
> diff --git a/defs.h b/defs.h
> index aa17792..ab98cb7 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -3335,6 +3335,9 @@ struct arm64_stackframe {
> #define VTOP(X) x86_64_VTOP((ulong)(X))
> #define IS_VMALLOC_ADDR(X) x86_64_IS_VMALLOC_ADDR((ulong)(X))
>
> +#define IS_VMEMMAP_PAGE_ADDR(X) x86_64_IS_VMEMMAP_PAGE_ADDR((ulong)(X))
> +#define VMEMMAP_PAGE_TO_PFN(X) (((ulong)(X) - VMEMMAP_VADDR) / SIZE(page))
> +
> /*
> * the default page table level for x86_64:
> * 4 level page tables
> @@ -5646,6 +5649,7 @@ void x86_64_dump_machdep_table(ulong);
> ulong x86_64_PTOV(ulong);
> ulong x86_64_VTOP(ulong);
> int x86_64_IS_VMALLOC_ADDR(ulong);
> +int x86_64_IS_VMEMMAP_PAGE_ADDR(ulong);
> void x86_64_display_idt_table(void);
> #define display_idt_table() x86_64_display_idt_table()
> long x86_64_exception_frame(ulong, ulong, char *, struct bt_info *, FILE *);
> diff --git a/memory.c b/memory.c
> index 0df8ecc..928c3c2 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -13350,6 +13350,30 @@ is_page_ptr(ulong addr, physaddr_t *phys)
> physaddr_t section_paddr;
>
> if (IS_SPARSEMEM()) {
> +#ifdef IS_VMEMMAP_PAGE_ADDR
> + if (machdep->flags & VMEMMAP) {
> + if (IS_VMEMMAP_PAGE_ADDR(addr)) {
> + nr = pfn_to_section_nr(VMEMMAP_PAGE_TO_PFN(addr));
> + if ((sec_addr = valid_section_nr(nr))) {
> + coded_mem_map = section_mem_map_addr(sec_addr);
> + mem_map = sparse_decode_mem_map(coded_mem_map, nr);
> + end_mem_map = mem_map + (PAGES_PER_SECTION() * SIZE(page));
> +
> + if ((addr >= mem_map) && (addr < end_mem_map)) {
> + if ((addr - mem_map) % SIZE(page))
> + return FALSE;
> + if (phys) {
> + section_paddr = PTOB(section_nr_to_pfn(nr));
> + pgnum = (addr - mem_map) / SIZE(page);
> + *phys = section_paddr + ((physaddr_t)pgnum * PAGESIZE());
> + }
> + return TRUE;
> + }
> + }
> + }
> + return FALSE;
> + }
> +#endif
> nr_mem_sections = NR_MEM_SECTIONS();
> for (nr = 0; nr < nr_mem_sections ; nr++) {
> if ((sec_addr = valid_section_nr(nr))) {
> diff --git a/x86_64.c b/x86_64.c
> index 7449571..7240c5d 100644
> --- a/x86_64.c
> +++ b/x86_64.c
> @@ -1570,6 +1570,14 @@ x86_64_IS_VMALLOC_ADDR(ulong vaddr)
> (vaddr >= VSYSCALL_START && vaddr < VSYSCALL_END));
> }
>
> +int
> +x86_64_IS_VMEMMAP_PAGE_ADDR(ulong vaddr)
> +{
> + return ((machdep->flags & VMEMMAP) &&
> + (vaddr >= VMEMMAP_VADDR && vaddr <= VMEMMAP_END) &&
> + !((vaddr - VMEMMAP_VADDR) % SIZE(page)));
> +}
> +
> static int
> x86_64_is_module_addr(ulong vaddr)
> {
> ---
>
> > So if we don't want to change the functionality of is_page_ptr(), then maybe
> > the binary search would be a suitable compromise for both accuracy and speed
> > on extremely large systems?
>
> I have not considered it enough yet, but if all ranges of mem_sections are
> ascending for section numbers and vmemmap holes like above are to be managed
> well, it might be good. (I'm guessing that the binary search might need an
> auxiliary array or something due to the vmemmap holes..)
>
> Thanks,
> Kazuhito Hagio
This "#ifdef IS_VMEMMAP_PAGE_ADDR" patch is getting really ugly. I've been
playing around with this, and this is my latest counter-proposal.
First, the mem_section numbers are ascending. They may not necessarily start
with 0, and there may be holes, but they are ascending. That being the case,
there is no need for is_page_ptr() to walk through NR_MEM_SECTIONS() worth
of entries, because there will be an ending number that's typically much
smaller. Even on a 256GB dumpfile I have on hand, which has a NR_MEM_SECTIONS()
value of 524288, the largest valid section number is 2055. (What is the smallest
and largest number that you see on whatever large-memory system that you are
testing with?)
In any case, let's store the largest section number during initialization in
the vm_table, and use it as a delimeter in is_page_ptr().
diff --git a/defs.h b/defs.h
index aa17792..8768fd5 100644
--- a/defs.h
+++ b/defs.h
@@ -2369,6 +2369,7 @@ struct vm_table { /* kernel VM-related data */
ulong mask;
char *name;
} *pageflags_data;
+ ulong max_mem_section_nr;
};
#define NODES (0x1)
diff --git a/memory.c b/memory.c
index 0df8ecc..6770937 100644
--- a/memory.c
+++ b/memory.c
@@ -255,7 +255,7 @@ static void PG_reserved_flag_init(void);
static void PG_slab_flag_init(void);
static ulong nr_blockdev_pages(void);
void sparse_mem_init(void);
-void dump_mem_sections(void);
+void dump_mem_sections(int);
void list_mem_sections(void);
ulong sparse_decode_mem_map(ulong, ulong);
char *read_mem_section(ulong);
@@ -13350,7 +13350,7 @@ is_page_ptr(ulong addr, physaddr_t *phys)
physaddr_t section_paddr;
if (IS_SPARSEMEM()) {
- nr_mem_sections = NR_MEM_SECTIONS();
+ nr_mem_sections = vt->max_mem_section_nr+1;
for (nr = 0; nr < nr_mem_sections ; nr++) {
if ((sec_addr = valid_section_nr(nr))) {
coded_mem_map = section_mem_map_addr(sec_addr);
@@ -13668,6 +13668,7 @@ dump_vm_table(int verbose)
fprintf(fp, " swap_info_struct: %lx\n", (ulong)vt->swap_info_struct);
fprintf(fp, " mem_sec: %lx\n", (ulong)vt->mem_sec);
fprintf(fp, " mem_section: %lx\n", (ulong)vt->mem_section);
+ fprintf(fp, " max_mem_section_nr: %ld\n", vt->max_mem_section_nr);
fprintf(fp, " ZONE_HIGHMEM: %d\n", vt->ZONE_HIGHMEM);
fprintf(fp, "node_online_map_len: %d\n", vt->node_online_map_len);
if (vt->node_online_map_len) {
@@ -16295,8 +16296,8 @@ dump_memory_nodes(int initialize)
vt->numnodes = n;
}
- if (!initialize && IS_SPARSEMEM())
- dump_mem_sections();
+ if (IS_SPARSEMEM())
+ dump_mem_sections(initialize);
}
/*
@@ -17128,9 +17129,9 @@ pfn_to_map(ulong pfn)
}
void
-dump_mem_sections(void)
+dump_mem_sections(int initialize)
{
- ulong nr,addr;
+ ulong nr, max, addr;
ulong nr_mem_sections;
ulong coded_mem_map, mem_map, pfn;
char buf1[BUFSIZE];
@@ -17140,6 +17141,15 @@ dump_mem_sections(void)
nr_mem_sections = NR_MEM_SECTIONS();
+ if (initialize) {
+ for (nr = max = 0; nr < nr_mem_sections ; nr++) {
+ if (valid_section_nr(nr))
+ max = nr;
+ }
+ vt->max_mem_section_nr = max;
+ return;
+ }
+
fprintf(fp, "\n");
pad_line(fp, BITS32() ? 59 : 67, '-');
fprintf(fp, "\n\nNR %s %s %s PFN\n",
Now, with respect to the architecture-specific, VMEMMAP-only, part
that is of most interest to you, let's do it with an architecture
specific callback. You can post it for x86_64, and the other architecture
maintainers can write their own version. For example, add a new
callback function to the machdep_table structure, i.e., like this:
struct machdep_table {
ulong flags;
ulong kvbase;
...
void (*get_irq_affinity)(int);
void (*show_interrupts)(int, ulong *);
+ int is_page_ptr(ulong, physaddr_t *);
};
Write the x86_64_is_page_ptr() function that works for VMEMMAP
kernels, and returns FALSE otherwise. And add the call to the top
of is_page_ptr() like this:
int
is_page_ptr(ulong addr, physaddr_t *phys)
{
int n;
ulong ppstart, ppend;
struct node_table *nt;
ulong pgnum, node_size;
ulong nr, sec_addr;
ulong nr_mem_sections;
ulong coded_mem_map, mem_map, end_mem_map;
physaddr_t section_paddr;
+ if (machdep->is_page_ptr(addr, phys))
+ return TRUE;
if (IS_SPARSEMEM()) {
...
To avoid having to check whether the machdep->is_page_ptr function
exists, write a generic_is_page_ptr() function that just returns
FALSE, and initialize machdep->is_page_ptr to generic_is_page_ptr in
the setup_environment() function. Later on, individual architectures
can overwrite it when machdep_init(SETUP_ENV) is called.
How's that sound?
Dave
More information about the Crash-utility
mailing list