[Crash-utility] [PATCH v2] arm64: fix kernel memory map handling for kaslr-enabled kernel

AKASHI Takahiro takahiro.akashi at linaro.org
Wed May 25 08:35:48 UTC 2016


On Tue, May 24, 2016 at 01:59:06PM -0400, Dave Anderson wrote:
> 
> ----- Original Message -----
> > Yet some issues, but ...
> >
> 
> Hi Takahiro,
> 
> Here are my general comments on my testing of the v2 patch, followed 
> by a few comments in the patch itself. 

Thank you for your quick review/testing, again.

> First, the combination of the new memory map layout and KASLR is somewhat
> confusing.  I am testing your patch on a 4.6.0-0.rc7.git2.1.fc25 kernel
> that has this configuration:

Right. I use the "kaslr kernel" or "kaslr-enabled kernel" in two
meanings (almost interchangeably):
   - kernel with a new memory map, that is v4.6 or later
   - kernel that has ability of KASLR (configured with CONFIG_RANDOMIZE_RAM)

When we talked offline, you mentioned the possibility of backporting
KASLR to older kernel, so I hesitated to use "v4.6 or later".
But I think that we'd better use "v4.6 or later" for a clarification
for now.

>   config-arm64:# CONFIG_RANDOMIZE_BASE is not set
> 
> So KASLR doesn't really enter into the picture.  But when bringing
> up the crash session, it shows the "kaslr kernel" WARNING:
> 
>   # ./crash
>   
>   crash 7.1.5++
>   Copyright (C) 2002-2016  Red Hat, Inc.
>   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
>   Copyright (C) 1999-2006  Hewlett-Packard Co
>   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
>   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>   Copyright (C) 2005, 2011  NEC Corporation
>   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>   This program is free software, covered by the GNU General Public License,
>   and you are welcome to change it and/or distribute copies of it under
>   certain conditions.  Enter "help copying" to see the conditions.
>   This program has absolutely no warranty.  Enter "help warranty" for details.
>    
>   WARNING: kimage_voffset not identified for kaslr kernel
>   GNU gdb (GDB) 7.6
>   Copyright (C) 2013 Free Software Foundation, Inc.
>   License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>   This is free software: you are free to change and redistribute it.
>   There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>   and "show warranty" for details.
>   This GDB was configured as "aarch64-unknown-linux-gnu"...
>   
>         KERNEL: /usr/lib/debug/lib/modules/4.6.0-0.rc7.git2.1.fc25.aarch64/vmlinux
>       DUMPFILE: /dev/crash
>           CPUS: 8
>           DATE: Tue May 24 10:08:08 2016
>         UPTIME: 11 days, 18:32:41
>   LOAD AVERAGE: 0.17, 0.09, 0.12
>          TASKS: 197
>       NODENAME: apm-mustang-ev3-36.khw.lab.eng.bos.redhat.com
>        RELEASE: 4.6.0-0.rc7.git2.1.fc25.aarch64
>        VERSION: #1 SMP Thu May 12 13:28:43 UTC 2016
>        MACHINE: aarch64  (unknown Mhz)
>         MEMORY: 16 GB
>            PID: 7556
>        COMMAND: "crash"
>           TASK: fffffe00beb45400  [THREAD_INFO: fffffe00beb98000]
>            CPU: 7
>          STATE: TASK_RUNNING (ACTIVE)
>   
>   crash>
>   
> Why show that WARNING in this case?  I understant that it's not 
> stashed during early initialization:

If we don't know kimage_voffset, we can't analyze the contents of
memory (live or kdump).

>   crash> help -m
>                  flags: 104000c5 (KSYMS_START|VM_L2_64K|VMEMMAP|IRQ_STACKS|MACHDEP_BT_TEXT|NEW_VMEMMAP)
>   ... [ cut ] ...
>        memory map layout: new      <--- BTW, this is redundant/not-a-member, and we've got NEW_VMEMMAP
>                  VA_BITS: 42
>            userspace_top: 0000040000000000
>              page_offset: fffffe0000000000
>       vmalloc_start_addr: fffffc0008000000
>              vmalloc_end: fffffdff5ffeffff
>            modules_vaddr: fffffc0000000000
>              modules_end: fffffc0007ffffff
>            vmemmap_vaddr: fffffdff80000000
>              vmemmap_end: fffffdffffffffff
>              kimage_text: fffffc0008080000
>               kimage_end: fffffc0009070000
>           kimage_voffset: 0000000000000000    <-- available if kernel is not randomized
>              phys_offset: 4000000000
>   ...
>   
> But it can be read:
>   
>   crash> px kimage_voffset
>   kimage_voffset = $1 = 0xfffffbc008000000
>   crash>
>
> SO because it wasn't determined during session initialization,
> it falls into the "no randomness" section of arm64_VTOP():

Exactly.

>   ulong
>   arm64_VTOP(ulong addr)
>   {
>           if (!(machdep->flags & NEW_VMEMMAP) ||
>               (addr >= machdep->machspec->page_offset)) {
>                   return machdep->machspec->phys_offset
>                           + (addr - machdep->machspec->page_offset);
>           } else {
>                   if (machdep->machspec->kimage_voffset)
>                           return addr - machdep->machspec->kimage_voffset;
>                   else /* no randomness */
>                           return machdep->machspec->phys_offset
>                                   + (addr - machdep->machspec->vmalloc_start_addr);
>           }
>   }
>   
> That works, but if "kimage_offset" can be read normally later on, perhaps it
> should update "machdep->machspec->kimage_voffset" at that time?

It is possible, but the check in arm64_VTOP() would be still necessary.

> There are some discrepancies with respect to the calculated addresses
> and what is seen by looking at the reported memory map in the kernel log:
> 
>   [    0.000000] Virtual kernel memory layout:
>   [    0.000000]     modules : 0xfffffc0000000000 - 0xfffffc0008000000   (   128 MB)
>   [    0.000000]     vmalloc : 0xfffffc0008000000 - 0xfffffdfedfff0000   (  2043 GB)
>   [    0.000000]       .text : 0xfffffc0008080000 - 0xfffffc0008890000   (  8256 KB)
>                      .rodata : 0xfffffc0008890000 - 0xfffffc0008c10000   (  3584 KB)
>                        .init : 0xfffffc0008c10000 - 0xfffffc0008d50000   (  1280 KB)
>                        .data : 0xfffffc0008d50000 - 0xfffffc0008eaac00   (  1387 KB)
>   [    0.000000]     vmemmap : 0xfffffdfee0000000 - 0xfffffdffe0000000   (     4 GB maximum)
>                                0xfffffdfee0000000 - 0xfffffdfee1000000   (    16 MB actual)
>   [    0.000000]     fixed   : 0xfffffdfffe7d0000 - 0xfffffdfffec00000   (  4288 KB)
>   [    0.000000]     PCI I/O : 0xfffffdfffee00000 - 0xfffffdffffe00000   (    16 MB)
>   [    0.000000]     memory  : 0xfffffe0000000000 - 0xfffffe0400000000   ( 16384 MB)
> 
> Comparing it to the calculated values:
> 
>                VA_BITS: 42
>          userspace_top: 0000040000000000
>            page_offset: fffffe0000000000    OK
>     vmalloc_start_addr: fffffc0008000000    OK 
>            vmalloc_end: fffffdff5ffeffff    ?  (seems wrong)
>          modules_vaddr: fffffc0000000000    OK
>            modules_end: fffffc0007ffffff    OK (-1)
>          vmemmap_vaddr: fffffdff80000000    ?  (seems wrong)
>            vmemmap_end: fffffdffffffffff    ?  (seems wrong)
>            kimage_text: fffffc0008080000    OK
>             kimage_end: fffffc0009070000    OK
>         kimage_voffset: 0000000000000000
>            phys_offset: 4000000000
> 
> Starting with vmalloc_start_addr/vmalloc_end, if I run "kmem -v" to dump the 
> vmalloc allocations, there are some allocations that are below the 
> 0xfffffc0008000000 start value shown in the log and by your calcuation.  
> I'm not clear on what that means?:
> 
>   crash> kmem -v
>      VMAP_AREA         VM_STRUCT                 ADDRESS RANGE                SIZE
>   fffffe03daefd900  fffffe03daefd880  fffffc0000c10000 - fffffc0000c30000   131072
>   fffffe00bf2c3a00  fffffe00bf2c3980  fffffc0000c90000 - fffffc0000cb0000   131072
>   fffffe00bf2c7400  fffffe00bf2c7380  fffffc0000d50000 - fffffc0000d70000   131072
>   fffffe03dc76aa00  fffffe03dc76a980  fffffc0000d90000 - fffffc0000db0000   131072
>   fffffe03dc76be00  fffffe03dc76bd80  fffffc0000dd0000 - fffffc0000ee0000  1114112
>   fffffe03d90e7900  fffffe03d90e7b80  fffffc0000f40000 - fffffc0000fb0000   458752
>   fffffe00bec12f00  fffffe00bec12b00  fffffc0000fe0000 - fffffc0001000000   131072
>   fffffe00bf3a1100  fffffe00bf3a1080  fffffc0001020000 - fffffc0001050000   196608
>   fffffe00bf3a3280  fffffe00bf3a3200  fffffc0001070000 - fffffc0001090000   131072
>   fffffe03ddf7fc00  fffffe03ddf7a880  fffffc0001230000 - fffffc0001250000   131072
>   fffffe03fd508900  fffffe03fd508880  fffffc0008000000 - fffffc0008020000   131072
>   fffffe03fd509580  fffffe03fd508d00  fffffc0008020000 - fffffc0008040000   131072
>   fffffe03fd508a00  fffffe03fd508980  fffffc0008040000 - fffffc0008070000   196608
>   ... [ cut ] ...
>   fffffe03d7235000  fffffe03d7233480  fffffc0014b30000 - fffffc0014b50000   131072
>   fffffe00b921e700  fffffe00b921df00  fffffc0014b70000 - fffffc0014b90000   131072
>   fffffe00b851eb00  fffffe00b8512900  fffffdfedfcf0000 - fffffdfedfe70000  1572864
>   fffffe03dca4ef00  fffffe03dca4ef80  fffffdfedfe70000 - fffffdfedfff0000  1572864
>   crash> 
> 
> Although the end value of fffffdfedfff0000 does match up with the value in the
> kernel log, your calculation is 2GB higher at fffffdff5ffeffff:
> 
>   crash> eval fffffdff5ffeffff - 0xfffffdfedfff0000
>   hexadecimal: 7fffffff  
>       decimal: 2147483647  
>         octal: 17777777777
>        binary: 0000000000000000000000000000000001111111111111111111111111111111
>   crash> eval 7fffffff + 1
>   hexadecimal: 80000000  (2GB)
>       decimal: 2147483648  
>         octal: 20000000000
>        binary: 0000000000000000000000000000000010000000000000000000000000000000
>   crash> 
> 
> With respect to the vmemmap range, vmmemap address in the log file is the one
> that is displayed by "kmem -p":
> 
>   crash> kmem -p
>         PAGE               PHYSICAL      MAPPING       INDEX CNT FLAGS
>   fffffdfee0000000       4000000000 fffffe03fd441950       2f  1 1002c referenced,uptodate,lru,mappedtodisk
>   fffffdfee0000040       4000010000 fffffe03fd441950       30  1 1002c referenced,uptodate,lru,mappedtodisk
>   fffffdfee0000080       4000020000 fffffe03fd441950       31  1 1002c referenced,uptodate,lru,mappedtodisk
>   fffffdfee00000c0       4000030000 fffffe03fd441950       32  1 1002c referenced,uptodate,lru,mappedtodisk
>   ...
>   fffffdfee0ffff00       43fffc0000                0        0  1 4000000000000400 reserved
>   fffffdfee0ffff40       43fffd0000                0        0  1 4000000000000400 reserved
>   fffffdfee0ffff80       43fffe0000                0        0  1 4000000000000400 reserved
>   fffffdfee0ffffc0       43ffff0000                0        0  1 4000000000000400 reserved
>   crash> 
>   
> And "kmem" alone recognizes it:
>   
>   crash> kmem fffffdfee0000000
>         PAGE               PHYSICAL      MAPPING       INDEX CNT FLAGS
>   fffffdfee0000000       4000000000 fffffe03fd441950       2f  1 1002c referenced,uptodate,lru,mappedtodisk
>   crash> 
>   
> But your calculated vmemmap_vaddr above doesn't seem right:
>   
>   crash> kmem fffffdff80000000
>   kmem: WARNING: cannot make virtual-to-physical translation: fffffdff80000000
>   fffffdff80000000: kernel virtual address not found in mem map
>   crash>
> 
> Do your numbers match up on a dump that actually has CONFIG_RANDOMIZE_BASE?

In my environment, the numbers above match up on a dump whether
CONFIG_RANDOMIZE_BASE is enabled or not.

Are you using a mainline kernel (final v4.6, not -rcX)?

> Anyway, onto the patch...
> 
> > 
> > changes in v2:
> > * Fixed build warnings
> > * Moved ARM64_NEW_VMEMMAP to machdep->flags
> > * Show additional kaslr-related parameters in arm64_dump_machdep_table()
> > * Handle a VMCOREINFO, "NUMBER(kimage_voffset)"
> > 
> > ===8<===
> > >From 080a54ec232ac48ef2d8123cbedcf0f1fe27e391 Mon Sep 17 00:00:00 2001
> > From: AKASHI Takahiro <takahiro.akashi at linaro.org>
> > Date: Mon, 16 May 2016 17:31:55 +0900
> > Subject: [PATCH v2] arm64: fix kernel memory map handling for kaslr-enabled
> >  kernel
> > 
> > In kernel v4.6, Kernel ASLR (KASLR) is supported on arm64, and the start
> > address of the kernel image can be randomized if CONFIG_RANDOMIZE_BASE is
> > enabled.
> > Even worse, the kernel image is no more mapped in the linear mapping, but
> > in vmalloc area (i.e. below PAGE_OFFSET).
> > 
> > Now, according to the kernel's memory.h, converting a virtual address to
> > a physical address should be done like below:
> > 
> > 	phys_addr_t __x = (phys_addr_t)(x);                             \
> > 	__x & BIT(VA_BITS - 1) ? (__x & ~PAGE_OFFSET) + PHYS_OFFSET :   \
> > 				 (__x - kimage_voffset); })
> > 
> > Please note that PHYS_OFFSET is no more equal to the start address of
> > the first usable memory block in SYSTEM RAM due to the fact mentioned
> > above.
> > 
> > This patch addresses this change and allows the crash utility to access
> > memory contents with correct addresses.
> > *However*, it is still rough-edged and we need more refinements.
> > 
> > 1-1) On a live system, crash with this patch won't work because
> >    we currently have no way to know kimage_voffset.
> > 
> > 1-2) For a core dump file, we can do simply:
> >        $ crash <vmlinux> <vmcore>
> >    as long as the file has "NUMBER(kimage_voffset)"
> >    (RELOC_AUTO|KASLR is automatically set.)
> > 
> >    I'm planning to add this enhancement in my next version of kexec/kdump
> >    patch, i.e. v17.
> > 
> > 2) My current change breaks the backward compatibility.
> >    Crash won't work with v4.5 or early kernel partly because I changed
> >    the calculation of VA_BITS. (I think that the existing code is wrong.)
> >    So far I don't have enough time to look into this issue as I'm focusing
> >    on KASLR support.
> 
> OK, but you still haven't shown how the current VA_BITS determination is
> not a correct representation of CONFIG_ARM64_VA_BITS.

At least, the current VA_BITS doesn't work with my calculations, which
were derived directly from v4.6's "asm/memory.h".

> > 3) Backtracing a 'panic'ed task fails:
> >  crash> bt
> >  PID: 999    TASK: ffffffe960081800  CPU: 0   COMMAND: "sh"
> >  bt: WARNING: corrupt prstatus? pstate=0x80000000, but no user frame found
> >  bt: WARNING: cannot determine starting stack frame for task ffffffe960081800
> > 
> >   frame->pc indicates that it is a kernel address, so obviously something
> >   is wrong. I'm diggin into it.
> 
> Can you add a debug statement that displays frame->pc, frame->sp, and frame->fp?

I've already identified the cause.
pt_regs->pstate was set mistakenly from "sprsr_el1" in panic().
But I have a difficulty here to fix the problem as we have no direct way
to retrieve a value of the *current* PSTATE.

> 
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi at linaro.org>
> > ---
> >  arm64.c   | 217
> >  ++++++++++++++++++++++++++++++++++++++++++++++++--------------
> >  defs.h    |  24 +++++--
> >  main.c    |   7 +-
> >  symbols.c |  10 +--
> >  4 files changed, 196 insertions(+), 62 deletions(-)
> > 
> > diff --git a/arm64.c b/arm64.c
> > index 34c8c59..6b97093 100644
> > --- a/arm64.c
> > +++ b/arm64.c
> > @@ -72,6 +72,21 @@ static int arm64_get_crash_notes(void);
> >  static void arm64_calc_VA_BITS(void);
> >  static int arm64_is_uvaddr(ulong, struct task_context *);
> >  
> > +ulong
> > +arm64_VTOP(ulong addr)
> > +{
> > +	if (!(machdep->flags & NEW_VMEMMAP) ||
> > +            (addr >= machdep->machspec->page_offset)) {
> > +		return machdep->machspec->phys_offset
> > +			+ (addr - machdep->machspec->page_offset);
> > +	} else {
> > +		if (machdep->machspec->kimage_voffset)
> > +			return addr - machdep->machspec->kimage_voffset;
> > +		else /* no randomness */
> > +			return machdep->machspec->phys_offset
> > +				+ (addr - machdep->machspec->vmalloc_start_addr);
> > +	}
> > +}
> >  
> >  /*
> >   * Do all necessary machine-specific setup here. This is called several
> >   times
> > @@ -81,6 +96,7 @@ void
> >  arm64_init(int when)
> >  {
> >  	ulong value;
> > +	char *string;
> >  	struct machine_specific *ms;
> >  
> >  #if defined(__x86_64__)
> > @@ -102,9 +118,34 @@ arm64_init(int when)
> >  		if (machdep->cmdline_args[0])
> >  			arm64_parse_cmdline_args();
> >  		machdep->flags |= MACHDEP_BT_TEXT;
> > +
> > +		ms = machdep->machspec;
> > +		if (!ms->kimage_voffset &&
> > +		    (string = pc->read_vmcoreinfo("NUMBER(kimage_voffset)"))) {
> > +			ms->kimage_voffset = htol(string, QUIET, NULL);
> > +			free(string);
> > +		}
> 
> It doesn't make sense to call pc->read_vmcoreinfo() if !DUMPFILE().

Do you think so?
By default, pc->read_vmcoreinfo is set to no_vmcoreinfo.
So it doesn't hurt anything.

> > +
> > +		if (ms->kimage_voffset) {
> > +			machdep->flags |= NEW_VMEMMAP;
> > +
> > +			/* if not specified in command line */
> > +			if (!kt->relocate && !(kt->flags2 & (RELOC_AUTO|KASLR)))
> > +				kt->flags2 |= (RELOC_AUTO|KASLR);
> > +		}
> > +
> 
> If CONFIG_RANDOMIZE_BASE is NOT set, should RELOC_AUTO|KASLR be set?  They
> don't get set on my live system.

Not necessarily, but the current implementation of derive_kaslr_offset()
just works even in this case.
(This is the reason that I don't think we need a VMCOREINFO for
CONFIG_RANDOMIZE_BASE.)

> >  		break;
> >  
> >  	case PRE_GDB:
> > +		/* This check is somewhat redundant */
> > +		if (kernel_symbol_exists("kimage_voffset")) {
> > +			machdep->flags |= NEW_VMEMMAP;
> > +
> > +			if (!machdep->machspec->kimage_voffset)
> > +				error(WARNING,
> > +				"kimage_voffset not identified for kaslr kernel\n");
> > +		}
> > +
> 
> As mentioned in the general discussion above, the WARNING above makes
> no sense if CONFIG_RANDOMIZE_BASE is not configured.

What about if I change the message "for v4.6 or later kernel"?

> I'm now thinking that CONFIG_RANDOMIZE_BASE (and maybe others like KASAN?) 
> should be passed in the dumpfile with VMCOREINFO_CONFIG().

Regarding KASAN, we only need to check for an existence of "kasan_init"
symbol as you recommended before :)

> 
> >  		if (!machdep->pagesize) {
> >  			/*
> >  			 * Kerneldoc Documentation/arm64/booting.txt describes
> > @@ -160,16 +201,35 @@ arm64_init(int when)
> >  		machdep->pagemask = ~((ulonglong)machdep->pageoffset);
> >  
> >  		arm64_calc_VA_BITS();
> > -		machdep->machspec->page_offset = ARM64_PAGE_OFFSET;
> > +		ms = machdep->machspec;
> > +		ms->page_offset = ARM64_PAGE_OFFSET;
> > +		/* FIXME: idmap for NEW_VMEMMAP */
> >  		machdep->identity_map_base = ARM64_PAGE_OFFSET;
> > -		machdep->machspec->userspace_top = ARM64_USERSPACE_TOP;
> > -		machdep->machspec->modules_vaddr = ARM64_MODULES_VADDR;
> > -		machdep->machspec->modules_end = ARM64_MODULES_END;
> > -		machdep->machspec->vmalloc_start_addr = ARM64_VMALLOC_START;
> > -		machdep->machspec->vmalloc_end = ARM64_VMALLOC_END;
> > -		machdep->kvbase = ARM64_VMALLOC_START;
> > -		machdep->machspec->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> > -		machdep->machspec->vmemmap_end = ARM64_VMEMMAP_END;
> > +		machdep->kvbase = ARM64_VA_START;
> > +		ms->userspace_top = ARM64_USERSPACE_TOP;
> > +		if (machdep->flags & NEW_VMEMMAP) {
> > +			struct syment *sp;
> > +
> > +			sp = kernel_symbol_search("_text");
> > +			ms->kimage_text = (sp ? sp->value : 0);
> > +			sp = kernel_symbol_search("_end");
> > +			ms->kimage_end = (sp ? sp->value : 0);
> > +
> > +			ms->modules_vaddr = ARM64_VA_START;
> > +			if (kernel_symbol_exists("kasan_init"))
> > +				ms->modules_vaddr += ARM64_KASAN_SHADOW_SIZE;
> > +			ms->modules_end = ms->modules_vaddr
> > +						+ ARM64_MODULES_VSIZE -1;
> > +
> > +			ms->vmalloc_start_addr = ms->modules_end + 1;
> > +		} else {
> > +			ms->modules_vaddr = ARM64_PAGE_OFFSET - MEGABYTES(64);
> > +			ms->modules_end = ARM64_PAGE_OFFSET - 1;
> > +			ms->vmalloc_start_addr = ARM64_VA_START;
> > +		}
> > +		ms->vmalloc_end = ARM64_VMALLOC_END;
> > +		ms->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> > +		ms->vmemmap_end = ARM64_VMEMMAP_END;
> >  
> >  		switch (machdep->pagesize)
> >  		{
> > @@ -232,8 +292,6 @@ arm64_init(int when)
> >  		machdep->stacksize = ARM64_STACK_SIZE;
> >  		machdep->flags |= VMEMMAP;
> >  
> > -		arm64_calc_phys_offset();
> > -
> >  		machdep->uvtop = arm64_uvtop;
> >  		machdep->kvtop = arm64_kvtop;
> >  		machdep->is_kvaddr = generic_is_kvaddr;
> > @@ -262,6 +320,10 @@ arm64_init(int when)
> >  		machdep->dumpfile_init = NULL;
> >  		machdep->verify_line_number = NULL;
> >  		machdep->init_kernel_pgd = arm64_init_kernel_pgd;
> > +
> > +		/* use machdep parameters */
> > +		arm64_calc_phys_offset();
> > +
> 
> This doesn't make any sense to me.  I understand you've done it because
> you are now doing a readmem() in arm64_calc_phys_offset().  But readmem()
> calls are never done this early.  And in fact, on the live system the
> readmem() in arm64_calc_phys_offset() fails like so (running "crash -d4"):
>  
>   <readmem: fffffc0008d70bb0, KVADDR, "memstart_addr", 8, (ROE|Q), 3ffcc2842c0>
>   <read_memory_device: addr: fffffc0008d70bb0 paddr: d70bb0 cnt: 8>
>   crash: read error: kernel virtual address: fffffc0008d70bb0  type: "memstart_addr"
> 
> It fails above because it needs the phys_base in order to do the readmem().  During
> runtime it has it:

This is because we don't know kimage_voffset on a live system.
(and so I mentioned this issue in my commit log.)

As far as kdump case is concerned, readmem() at this point just works
for any address in kernel image. This is my intentional use of this function.

>   crash> set debug 4
>   debug: 4
>   crash> rd memstart_addr
>   <addr: fffffc0008d70bb0 count: 1 flag: 490 (KVADDR)>
>   <readmem: fffffc0008d70bb0, KVADDR, "64-bit KVADDR", 8, (FOE), 3ffcbfbfc28>
>   <read_memory_device: addr: fffffc0008d70bb0 paddr: 4000d70bb0 cnt: 8>
>   fffffc0008d70bb0:  0000004000000000                    .... at ...
>   crash> 
> 
> So without kimage_voffset, that readmem(), how would that readmem() possibly work?
> You need the *contents* of memstart_addr in order to *read* memstart_addr.
> 
> Now, if it kimage_voffset were passed into the vmcore, then I guess you could
> read it.  But why do it that way?  That's what diskdump_get_phys_base() and
> kdump_get_phys_base() are supposed to do.  

Because I believe that this is the *common* place to do it either for
diskdump, or kdump ( or even a live system if we know kimage_voffset.)

> 
> >  		break;
> >  
> >  	case POST_GDB:
> > @@ -409,6 +471,8 @@ arm64_dump_machdep_table(ulong arg)
> >  		fprintf(fp, "%sIRQ_STACKS", others++ ? "|" : "");
> >  	if (machdep->flags & MACHDEP_BT_TEXT)
> >  		fprintf(fp, "%sMACHDEP_BT_TEXT", others++ ? "|" : "");
> > +	if (machdep->flags & NEW_VMEMMAP)
> > +		fprintf(fp, "%sNEW_VMEMMAP", others++ ? "|" : "");
> >  	fprintf(fp, ")\n");
> >  
> >  	fprintf(fp, "              kvbase: %lx\n", machdep->kvbase);
> > @@ -494,6 +558,8 @@ arm64_dump_machdep_table(ulong arg)
> >  	ms = machdep->machspec;
> >  
> >  	fprintf(fp, "            machspec: %lx\n", (ulong)ms);
> > +	fprintf(fp, "     memory map layout: %s\n",
> > +				(machdep->flags & NEW_VMEMMAP) ? "new" : "old");
> 
> As before, these internal "help" structure dumps just show existing structure fields,
> and NEW_VMMEMAP is already shown in the flags.

I put it as useful information for those people who might have got
any concerns about those odd numbers/kernel layout.
But if you don't like it, I will remove it.

> 
> >  	fprintf(fp, "               VA_BITS: %ld\n", ms->VA_BITS);
> >  	fprintf(fp, "         userspace_top: %016lx\n", ms->userspace_top);
> >  	fprintf(fp, "           page_offset: %016lx\n", ms->page_offset);
> > @@ -503,6 +569,11 @@ arm64_dump_machdep_table(ulong arg)
> >  	fprintf(fp, "           modules_end: %016lx\n", ms->modules_end);
> >  	fprintf(fp, "         vmemmap_vaddr: %016lx\n", ms->vmemmap_vaddr);
> >  	fprintf(fp, "           vmemmap_end: %016lx\n", ms->vmemmap_end);
> > +	if (machdep->flags & NEW_VMEMMAP) {
> > +		fprintf(fp, "           kimage_text: %016lx\n", ms->kimage_text);
> > +		fprintf(fp, "            kimage_end: %016lx\n", ms->kimage_end);
> > +		fprintf(fp, "        kimage_voffset: %016lx\n", ms->kimage_voffset);
> > +	}
> >  	fprintf(fp, "           phys_offset: %lx\n", ms->phys_offset);
> >  	fprintf(fp, "__exception_text_start: %lx\n", ms->__exception_text_start);
> >  	fprintf(fp, "  __exception_text_end: %lx\n", ms->__exception_text_end);
> > @@ -543,6 +614,42 @@ arm64_dump_machdep_table(ulong arg)
> >  	}
> >  }
> >  
> > +static int
> > +arm64_parse_machdep_arg_l(char *argstring, char *param, ulong *value)
> > +{
> > +	int len;
> > +	int megabytes = FALSE;
> > +	char *p;
> > +
> > +	len = strlen(param);
> > +	if (!STRNEQ(argstring, param) || (argstring[len] != '='))
> > +		return FALSE;
> > +
> > +	if ((LASTCHAR(argstring) == 'm') ||
> > +	    (LASTCHAR(argstring) == 'M')) {
> > +		LASTCHAR(argstring) = NULLCHAR;
> > +		megabytes = TRUE;
> > +	}
> > +
> > +	p = argstring + len + 1;
> > +	if (strlen(p)) {
> > +		int flags = RETURN_ON_ERROR | QUIET;
> > +		int err = 0;
> > +
> > +		if (megabytes) {
> > +			*value = dtol(p, flags, &err);
> > +			if (!err)
> > +				*value = MEGABYTES(*value);
> > +		} else {
> > +			*value = htol(p, flags, &err);
> > +		}
> > +
> > +		if (!err)
> > +			return TRUE;
> > +	}
> > +
> > +	return FALSE;
> > +}
> >  
> >  /*
> >   * Parse machine dependent command line arguments.
> > @@ -554,11 +661,10 @@ arm64_dump_machdep_table(ulong arg)
> >  static void
> >  arm64_parse_cmdline_args(void)
> >  {
> > -	int index, i, c, err;
> > +	int index, i, c;
> >  	char *arglist[MAXARGS];
> >  	char buf[BUFSIZE];
> >  	char *p;
> > -	ulong value = 0;
> >  
> >  	for (index = 0; index < MAX_MACHDEP_ARGS; index++) {
> >  		if (!machdep->cmdline_args[index])
> > @@ -580,39 +686,23 @@ arm64_parse_cmdline_args(void)
> >  		c = parse_line(buf, arglist);
> >  
> >  		for (i = 0; i < c; i++) {
> > -			err = 0;
> > -
> > -			if (STRNEQ(arglist[i], "phys_offset=")) {
> > -				int megabytes = FALSE;
> > -				int flags = RETURN_ON_ERROR | QUIET;
> > -
> > -				if ((LASTCHAR(arglist[i]) == 'm') ||
> > -				    (LASTCHAR(arglist[i]) == 'M')) {
> > -					LASTCHAR(arglist[i]) = NULLCHAR;
> > -					megabytes = TRUE;
> > -				}
> > -
> > -				p = arglist[i] + strlen("phys_offset=");
> > -				if (strlen(p)) {
> > -					if (megabytes)
> > -						value = dtol(p, flags, &err);
> > -					else
> > -						value = htol(p, flags, &err);
> > -				}
> > -
> > -				if (!err) {
> > -					if (megabytes)
> > -						value = MEGABYTES(value);
> > -
> > -					machdep->machspec->phys_offset = value;
> > -
> > -					error(NOTE,
> > -					    "setting phys_offset to: 0x%lx\n\n",
> > -						machdep->machspec->phys_offset);
> > +			if (arm64_parse_machdep_arg_l(arglist[i],
> > +					"phys_offset",
> > +					&machdep->machspec->phys_offset)) {
> > +				error(NOTE,
> > +					"setting phys_offset to: 0x%lx\n\n",
> > +					machdep->machspec->phys_offset);
> > +
> > +				machdep->flags |= PHYS_OFFSET;
> > +				continue;
> > +			} else if (arm64_parse_machdep_arg_l(arglist[i],
> > +					"kimage_voffset",
> > +					&machdep->machspec->kimage_voffset)) {
> > +				error(NOTE,
> > +					"setting kimage_voffset to: 0x%lx\n\n",
> > +					machdep->machspec->kimage_voffset);
> >  
> > -					machdep->flags |= PHYS_OFFSET;
> > -					continue;
> > -				}
> > +				continue;
> >  			}
> >  
> >  			error(WARNING, "ignoring --machdep option: %s\n",
> > @@ -627,10 +717,20 @@ arm64_calc_phys_offset(void)
> >  {
> >  	struct machine_specific *ms = machdep->machspec;
> >  	ulong phys_offset;
> > +	struct syment *sp;
> > +	ulong value;
> >  
> >  	if (machdep->flags & PHYS_OFFSET) /* --machdep override */
> >  		return;
> >  
> > +	sp = kernel_symbol_search("memstart_addr");
> > +	if (sp && readmem(sp->value, KVADDR, (char *)&value, sizeof(value),
> > +				"memstart_addr", QUIET|RETURN_ON_ERROR)) {
> > +		ms->phys_offset = value;
> > +		machdep->flags |= PHYS_OFFSET;
> > +		return;
> > +	}
> > +
> 
> Also, I may be missing something, but the PHYS_OFFSET flag means that it was 
> passed in via --machdep, and is only used in this function to avoid any further
> attempts to figure out what it is.  Here you are setting the flag for no particular 
> reason that I can see.  Why?

Well, I don't remember very well. If not necessary, I will remove it.

> But again, doing a readmem() call this early is setting a bad precedent -- they have
> *always* been delayed until after gdb_session_init() in main_loop().  And the work
> of determining phys_base should be done by the dumpfile call-out functions that
> already exist.  
>  
> 
> >  	/*
> >  	 * Next determine suitable value for phys_offset. User can override this
> >  	 * by passing valid '--machdep phys_offset=<addr>' option.
> > @@ -2377,6 +2477,11 @@ arm64_IS_VMALLOC_ADDR(ulong vaddr)
> >  {
> >  	struct machine_specific *ms = machdep->machspec;
> >  	
> > +	if ((machdep->flags & NEW_VMEMMAP) &&
> > +	    (vaddr >= machdep->machspec->kimage_text) &&
> > +	    (vaddr <= machdep->machspec->kimage_end))
> > +		return FALSE;
> > +
> >          return ((vaddr >= ms->vmalloc_start_addr && vaddr <=
> >          ms->vmalloc_end) ||
> >                  ((machdep->flags & VMEMMAP) &&
> >                   (vaddr >= ms->vmemmap_vaddr && vaddr <= ms->vmemmap_end))
> >                   ||
> > @@ -2407,7 +2512,11 @@ arm64_calc_VA_BITS(void)
> >  
> >  	for (bitval = highest_bit_long(value); bitval; bitval--) {
> >  		if ((value & (1UL << bitval)) == 0) {
> > +#if 1
> > +			machdep->machspec->VA_BITS = bitval + 1;
> > +#else /* FIXME: bug? */
> >  			machdep->machspec->VA_BITS = bitval + 2;
> > +#endif
> >  			break;
> >  		}
> >  	}
> 
> Again, a non-starter until the above is resolved -- but you know that already...

Just a reminder for now.

> 
> > @@ -2459,10 +2568,22 @@ arm64_calc_virtual_memory_ranges(void)
> >  		break;
> >          }
> >  
> > -	vmemmap_size = ALIGN((1UL << (ms->VA_BITS - machdep->pageshift)) *
> > SIZE(page), PUD_SIZE);
> > +	if (machdep->flags & NEW_VMEMMAP)
> > +#define STRUCT_PAGE_MAX_SHIFT   6
> > +		vmemmap_size = 1UL << (ms->VA_BITS - machdep->pageshift - 1
> > +						+ STRUCT_PAGE_MAX_SHIFT);
> > +	else
> > +		vmemmap_size = ALIGN((1UL << (ms->VA_BITS - machdep->pageshift)) *
> > SIZE(page), PUD_SIZE);
> > +
> >  	vmalloc_end = (ms->page_offset - PUD_SIZE - vmemmap_size - SZ_64K);
> > -	vmemmap_start = vmalloc_end + SZ_64K;
> > -	vmemmap_end = vmemmap_start + vmemmap_size;
> > +
> > +	if (machdep->flags & NEW_VMEMMAP) {
> > +		vmemmap_start = ms->page_offset - vmemmap_size;
> > +		vmemmap_end = ms->page_offset;
> > +	} else {
> > +		vmemmap_start = vmalloc_end + SZ_64K;
> > +		vmemmap_end = vmemmap_start + vmemmap_size;
> > +	}
> >  
> >  	ms->vmalloc_end = vmalloc_end - 1;
> >  	ms->vmemmap_vaddr = vmemmap_start;
> > diff --git a/defs.h b/defs.h
> > index a09fa9a..2bbb55b 100644
> > --- a/defs.h
> > +++ b/defs.h
> > @@ -2844,8 +2844,8 @@ typedef u64 pte_t;
> >  
> >  #define PTOV(X) \
> >  	((unsigned
> >  	long)(X)-(machdep->machspec->phys_offset)+(machdep->machspec->page_offset))
> > -#define VTOP(X) \
> > -	((unsigned
> > long)(X)-(machdep->machspec->page_offset)+(machdep->machspec->phys_offset))
> > +
> > +#define VTOP(X)               arm64_VTOP((ulong)(X))
> >  
> >  #define USERSPACE_TOP   (machdep->machspec->userspace_top)
> >  #define PAGE_OFFSET     (machdep->machspec->page_offset)
> > @@ -2938,18 +2938,23 @@ typedef signed int s32;
> >  #define VM_L3_4K      (0x10)
> >  #define KDUMP_ENABLED (0x20)
> >  #define IRQ_STACKS    (0x40)
> > +#define NEW_VMEMMAP   (0x80)
> >  
> >  /*
> >   * sources: Documentation/arm64/memory.txt
> >   *          arch/arm64/include/asm/memory.h
> >   *          arch/arm64/include/asm/pgtable.h
> >   */
> > -
> > -#define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) <<
> > (machdep->machspec->VA_BITS - 1))
> > +#define ARM64_VA_START       ((0xffffffffffffffffUL) \
> > +					<< machdep->machspec->VA_BITS)
> > +#define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) \
> > +					<< (machdep->machspec->VA_BITS - 1))
> >  #define ARM64_USERSPACE_TOP  ((1UL) << machdep->machspec->VA_BITS)
> > -#define ARM64_MODULES_VADDR  (ARM64_PAGE_OFFSET - MEGABYTES(64))
> > -#define ARM64_MODULES_END    (ARM64_PAGE_OFFSET - 1)
> > -#define ARM64_VMALLOC_START  ((0xffffffffffffffffUL) << machdep->machspec->VA_BITS)
> > +
> > +/* only used for v4.6 or later */
> > +#define ARM64_MODULES_VSIZE     MEGABYTES(128)
> > +#define ARM64_KASAN_SHADOW_SIZE (1UL << (machdep->machspec->VA_BITS - 3))
> > +
> >  /*
> >   * The following 3 definitions are the original values, but are obsolete
> >   * for 3.17 and later kernels because they are now build-time calculations.
> > @@ -3028,6 +3033,10 @@ struct machine_specific {
> >  	ulong kernel_flags;
> >  	ulong irq_stack_size;
> >  	ulong *irq_stacks;
> > +	/* only needed for kaslr-enabled kernel */
> > +	ulong kimage_voffset;
> > +	ulong kimage_text;
> > +	ulong kimage_end;
> >  };
> 
> The comment is wrong -- the kimage_text and kimage_end values are needed for
> NEW_VMEMMAP kernels that are NOT configured with CONFIG_RANDOMIZE_BASE.
> Unless you consider my live system "kaslr-enabled"?

I will change it to "for v4.6 or later".

I have limited (or no) time to investigate a live-system case, and so if
you have any suggestions or your own patch, it is very much appreciated.

Thanks,
-Takahiro AKASHI

> >  
> >  struct arm64_stackframe {
> > @@ -5385,6 +5394,7 @@ void unwind_backtrace(struct bt_info *);
> >  #ifdef ARM64
> >  void arm64_init(int);
> >  void arm64_dump_machdep_table(ulong);
> > +ulong arm64_VTOP(ulong);
> >  int arm64_IS_VMALLOC_ADDR(ulong);
> >  ulong arm64_swp_type(ulong);
> >  ulong arm64_swp_offset(ulong);
> > diff --git a/main.c b/main.c
> > index 821bb4e..3f24908 100644
> > --- a/main.c
> > +++ b/main.c
> > @@ -227,9 +227,10 @@ main(int argc, char **argv)
> >  						optarg);
> >  				}
> >  			} else if (STREQ(long_options[option_index].name, "kaslr")) {
> > -				if (!machine_type("X86_64"))
> > -					error(INFO, "--kaslr only valid "
> > -						"with X86_64 machine type.\n");
> > +				if (!machine_type("X86_64") &&
> > +				    !machine_type("ARM64"))
> > +					error(INFO, "--kaslr not valid "
> > +						"with this machine type.\n");
> >  				else if (STREQ(optarg, "auto"))
> >  					kt->flags2 |= (RELOC_AUTO|KASLR);
> >  				else {
> > diff --git a/symbols.c b/symbols.c
> > index a8d3563..df27793 100644
> > --- a/symbols.c
> > +++ b/symbols.c
> > @@ -593,7 +593,8 @@ kaslr_init(void)
> >  {
> >  	char *string;
> >  
> > -	if (!machine_type("X86_64") || (kt->flags & RELOC_SET))
> > +	if ((!machine_type("X86_64") && !machine_type("ARM64")) ||
> > +	    (kt->flags & RELOC_SET))
> >  		return;
> >  
> >  	/*
> > @@ -712,7 +713,7 @@ store_symbols(bfd *abfd, int dynamic, void *minisyms,
> > long symcount,
> >  	if (machine_type("X86")) {
> >  		if (!(kt->flags & RELOC_SET))
> >  			kt->flags |= RELOC_FORCE;
> > -	} else if (machine_type("X86_64")) {
> > +	} else if (machine_type("X86_64") || machine_type("ARM64")) {
> >  		if ((kt->flags2 & RELOC_AUTO) && !(kt->flags & RELOC_SET))
> >  			derive_kaslr_offset(abfd, dynamic, from,
> >  				fromend, size, store);
> > @@ -783,7 +784,8 @@ store_sysmap_symbols(void)
> >                  error(FATAL, "symbol table namespace malloc: %s\n",
> >                          strerror(errno));
> >  
> > -	if (!machine_type("X86") && !machine_type("X86_64"))
> > +	if (!machine_type("X86") && !machine_type("X86_64") &&
> > +	    !machine_type("ARM64"))
> >  		kt->flags &= ~RELOC_SET;
> >  
> >  	first = 0;
> > @@ -4681,7 +4683,7 @@ value_search(ulong value, ulong *offset)
> >  	if ((sp = machdep->value_to_symbol(value, offset)))
> >  		return sp;
> >  
> > -	if (IS_VMALLOC_ADDR(value))
> > +	if (IS_VMALLOC_ADDR(value))
> >  		goto check_modules;
> >  
> >  	if ((sp = symval_hash_search(value)) == NULL)
> > --
> > 2.8.1
> > 
> > --
> 
> Looking better, still a ways to go...
> 
> Thanks,
>   Dave
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

-- 
Thanks,
-Takahiro AKASHI




More information about the Crash-utility mailing list