[Crash-utility] [PATCH v2] arm64: fix kernel memory map handling for kaslr-enabled kernel

Dave Anderson anderson at redhat.com
Wed May 25 14:28:10 UTC 2016


----- Original Message -----
> On Tue, May 24, 2016 at 01:59:06PM -0400, Dave Anderson wrote:
> > 
> > ----- Original Message -----
> > > Yet some issues, but ...
> > >
> > 
> > Hi Takahiro,
> > 
> > Here are my general comments on my testing of the v2 patch, followed
> > by a few comments in the patch itself.
> 
> Thank you for your quick review/testing, again.
> 
> > First, the combination of the new memory map layout and KASLR is somewhat
> > confusing.  I am testing your patch on a 4.6.0-0.rc7.git2.1.fc25 kernel
> > that has this configuration:
> 
> Right. I use the "kaslr kernel" or "kaslr-enabled kernel" in two
> meanings (almost interchangeably):
>    - kernel with a new memory map, that is v4.6 or later
>    - kernel that has ability of KASLR (configured with CONFIG_RANDOMIZE_RAM)
> 
> When we talked offline, you mentioned the possibility of backporting
> KASLR to older kernel, so I hesitated to use "v4.6 or later".
> But I think that we'd better use "v4.6 or later" for a clarification
> for now.
 
Actually I was thinking more along the lines of the new vmemmap stuff
being backported.  But it looks like that would be impossible without
also bringing along "kimage_voffset".  


> >   config-arm64:# CONFIG_RANDOMIZE_BASE is not set
> > 
> > So KASLR doesn't really enter into the picture.  But when bringing
> > up the crash session, it shows the "kaslr kernel" WARNING:
> > 
> >   # ./crash
> >   
> >   crash 7.1.5++
> >   Copyright (C) 2002-2016  Red Hat, Inc.
> >   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
> >   Copyright (C) 1999-2006  Hewlett-Packard Co
> >   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
> >   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> >   Copyright (C) 2005, 2011  NEC Corporation
> >   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> >   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> >   This program is free software, covered by the GNU General Public License,
> >   and you are welcome to change it and/or distribute copies of it under
> >   certain conditions.  Enter "help copying" to see the conditions.
> >   This program has absolutely no warranty.  Enter "help warranty" for
> >   details.
> >    
> >   WARNING: kimage_voffset not identified for kaslr kernel
> >   GNU gdb (GDB) 7.6
> >   Copyright (C) 2013 Free Software Foundation, Inc.
> >   License GPLv3+: GNU GPL version 3 or later
> >   <http://gnu.org/licenses/gpl.html>
> >   This is free software: you are free to change and redistribute it.
> >   There is NO WARRANTY, to the extent permitted by law.  Type "show
> >   copying"
> >   and "show warranty" for details.
> >   This GDB was configured as "aarch64-unknown-linux-gnu"...
> >   
> >         KERNEL:
> >         /usr/lib/debug/lib/modules/4.6.0-0.rc7.git2.1.fc25.aarch64/vmlinux
> >       DUMPFILE: /dev/crash
> >           CPUS: 8
> >           DATE: Tue May 24 10:08:08 2016
> >         UPTIME: 11 days, 18:32:41
> >   LOAD AVERAGE: 0.17, 0.09, 0.12
> >          TASKS: 197
> >       NODENAME: apm-mustang-ev3-36.khw.lab.eng.bos.redhat.com
> >        RELEASE: 4.6.0-0.rc7.git2.1.fc25.aarch64
> >        VERSION: #1 SMP Thu May 12 13:28:43 UTC 2016
> >        MACHINE: aarch64  (unknown Mhz)
> >         MEMORY: 16 GB
> >            PID: 7556
> >        COMMAND: "crash"
> >           TASK: fffffe00beb45400  [THREAD_INFO: fffffe00beb98000]
> >            CPU: 7
> >          STATE: TASK_RUNNING (ACTIVE)
> >   
> >   crash>
> >   
> > Why show that WARNING in this case?  I understant that it's not
> > stashed during early initialization:
> 
> If we don't know kimage_voffset, we can't analyze the contents of
> memory (live or kdump).

I understand.  But clearly that's not the case with my system above that
does not have CONFIG_RANDOMIZE_BASE configured, so the warning message
is nonsensical/confusing.  That's why I was speculating about perhaps
having kdump export a VMCOREINFO_CONFIG(RANDOMIZE_BASE) so that it is 
readily apparent.

> 
> >   crash> help -m
> >                  flags: 104000c5
> >                  (KSYMS_START|VM_L2_64K|VMEMMAP|IRQ_STACKS|MACHDEP_BT_TEXT|NEW_VMEMMAP)
> >   ... [ cut ] ...
> >        memory map layout: new      <--- BTW, this is
> >        redundant/not-a-member, and we've got NEW_VMEMMAP
> >                  VA_BITS: 42
> >            userspace_top: 0000040000000000
> >              page_offset: fffffe0000000000
> >       vmalloc_start_addr: fffffc0008000000
> >              vmalloc_end: fffffdff5ffeffff
> >            modules_vaddr: fffffc0000000000
> >              modules_end: fffffc0007ffffff
> >            vmemmap_vaddr: fffffdff80000000
> >              vmemmap_end: fffffdffffffffff
> >              kimage_text: fffffc0008080000
> >               kimage_end: fffffc0009070000
> >           kimage_voffset: 0000000000000000    <-- available if kernel is
> >           not randomized
> >              phys_offset: 4000000000
> >   ...
> >   
> > But it can be read:
> >   
> >   crash> px kimage_voffset
> >   kimage_voffset = $1 = 0xfffffbc008000000
> >   crash>
> >
> > SO because it wasn't determined during session initialization,
> > it falls into the "no randomness" section of arm64_VTOP():
> 
> Exactly.
> 
> >   ulong
> >   arm64_VTOP(ulong addr)
> >   {
> >           if (!(machdep->flags & NEW_VMEMMAP) ||
> >               (addr >= machdep->machspec->page_offset)) {
> >                   return machdep->machspec->phys_offset
> >                           + (addr - machdep->machspec->page_offset);
> >           } else {
> >                   if (machdep->machspec->kimage_voffset)
> >                           return addr - machdep->machspec->kimage_voffset;
> >                   else /* no randomness */
> >                           return machdep->machspec->phys_offset
> >                                   + (addr -
> >                                   machdep->machspec->vmalloc_start_addr);
> >           }
> >   }
> >   
> > That works, but if "kimage_offset" can be read normally later on, perhaps it
> > should update "machdep->machspec->kimage_voffset" at that time?
> 
> It is possible, but the check in arm64_VTOP() would be still necessary.

Agreed...

> 
> > There are some discrepancies with respect to the calculated addresses
> > and what is seen by looking at the reported memory map in the kernel log:
> > 
> >   [    0.000000] Virtual kernel memory layout:
> >   [    0.000000]     modules : 0xfffffc0000000000 - 0xfffffc0008000000   (
> >   128 MB)
> >   [    0.000000]     vmalloc : 0xfffffc0008000000 - 0xfffffdfedfff0000   (
> >   2043 GB)
> >   [    0.000000]       .text : 0xfffffc0008080000 - 0xfffffc0008890000   (
> >   8256 KB)
> >                      .rodata : 0xfffffc0008890000 - 0xfffffc0008c10000   (
> >                      3584 KB)
> >                        .init : 0xfffffc0008c10000 - 0xfffffc0008d50000   (
> >                        1280 KB)
> >                        .data : 0xfffffc0008d50000 - 0xfffffc0008eaac00   (
> >                        1387 KB)
> >   [    0.000000]     vmemmap : 0xfffffdfee0000000 - 0xfffffdffe0000000   (
> >   4 GB maximum)
> >                                0xfffffdfee0000000 - 0xfffffdfee1000000   (
> >                                16 MB actual)
> >   [    0.000000]     fixed   : 0xfffffdfffe7d0000 - 0xfffffdfffec00000   (
> >   4288 KB)
> >   [    0.000000]     PCI I/O : 0xfffffdfffee00000 - 0xfffffdffffe00000   (
> >   16 MB)
> >   [    0.000000]     memory  : 0xfffffe0000000000 - 0xfffffe0400000000   (
> >   16384 MB)
> > 
> > Comparing it to the calculated values:
> > 
> >                VA_BITS: 42
> >          userspace_top: 0000040000000000
> >            page_offset: fffffe0000000000    OK
> >     vmalloc_start_addr: fffffc0008000000    OK
> >            vmalloc_end: fffffdff5ffeffff    ?  (seems wrong)
> >          modules_vaddr: fffffc0000000000    OK
> >            modules_end: fffffc0007ffffff    OK (-1)
> >          vmemmap_vaddr: fffffdff80000000    ?  (seems wrong)
> >            vmemmap_end: fffffdffffffffff    ?  (seems wrong)
> >            kimage_text: fffffc0008080000    OK
> >             kimage_end: fffffc0009070000    OK
> >         kimage_voffset: 0000000000000000
> >            phys_offset: 4000000000
> > 
> > Starting with vmalloc_start_addr/vmalloc_end, if I run "kmem -v" to dump
> > the
> > vmalloc allocations, there are some allocations that are below the
> > 0xfffffc0008000000 start value shown in the log and by your calcuation.
> > I'm not clear on what that means?:
> > 
> >   crash> kmem -v
> >      VMAP_AREA         VM_STRUCT                 ADDRESS RANGE
> >      SIZE
> >   fffffe03daefd900  fffffe03daefd880  fffffc0000c10000 - fffffc0000c30000
> >   131072
> >   fffffe00bf2c3a00  fffffe00bf2c3980  fffffc0000c90000 - fffffc0000cb0000
> >   131072
> >   fffffe00bf2c7400  fffffe00bf2c7380  fffffc0000d50000 - fffffc0000d70000
> >   131072
> >   fffffe03dc76aa00  fffffe03dc76a980  fffffc0000d90000 - fffffc0000db0000
> >   131072
> >   fffffe03dc76be00  fffffe03dc76bd80  fffffc0000dd0000 - fffffc0000ee0000
> >   1114112
> >   fffffe03d90e7900  fffffe03d90e7b80  fffffc0000f40000 - fffffc0000fb0000
> >   458752
> >   fffffe00bec12f00  fffffe00bec12b00  fffffc0000fe0000 - fffffc0001000000
> >   131072
> >   fffffe00bf3a1100  fffffe00bf3a1080  fffffc0001020000 - fffffc0001050000
> >   196608
> >   fffffe00bf3a3280  fffffe00bf3a3200  fffffc0001070000 - fffffc0001090000
> >   131072
> >   fffffe03ddf7fc00  fffffe03ddf7a880  fffffc0001230000 - fffffc0001250000
> >   131072
> >   fffffe03fd508900  fffffe03fd508880  fffffc0008000000 - fffffc0008020000
> >   131072
> >   fffffe03fd509580  fffffe03fd508d00  fffffc0008020000 - fffffc0008040000
> >   131072
> >   fffffe03fd508a00  fffffe03fd508980  fffffc0008040000 - fffffc0008070000
> >   196608
> >   ... [ cut ] ...
> >   fffffe03d7235000  fffffe03d7233480  fffffc0014b30000 - fffffc0014b50000
> >   131072
> >   fffffe00b921e700  fffffe00b921df00  fffffc0014b70000 - fffffc0014b90000
> >   131072
> >   fffffe00b851eb00  fffffe00b8512900  fffffdfedfcf0000 - fffffdfedfe70000
> >   1572864
> >   fffffe03dca4ef00  fffffe03dca4ef80  fffffdfedfe70000 - fffffdfedfff0000
> >   1572864
> >   crash>
> > 
> > Although the end value of fffffdfedfff0000 does match up with the value in
> > the
> > kernel log, your calculation is 2GB higher at fffffdff5ffeffff:
> > 
> >   crash> eval fffffdff5ffeffff - 0xfffffdfedfff0000
> >   hexadecimal: 7fffffff
> >       decimal: 2147483647
> >         octal: 17777777777
> >        binary:
> >        0000000000000000000000000000000001111111111111111111111111111111
> >   crash> eval 7fffffff + 1
> >   hexadecimal: 80000000  (2GB)
> >       decimal: 2147483648
> >         octal: 20000000000
> >        binary:
> >        0000000000000000000000000000000010000000000000000000000000000000
> >   crash>
> > 
> > With respect to the vmemmap range, vmmemap address in the log file is the
> > one
> > that is displayed by "kmem -p":
> > 
> >   crash> kmem -p
> >         PAGE               PHYSICAL      MAPPING       INDEX CNT FLAGS
> >   fffffdfee0000000       4000000000 fffffe03fd441950       2f  1 1002c
> >   referenced,uptodate,lru,mappedtodisk
> >   fffffdfee0000040       4000010000 fffffe03fd441950       30  1 1002c
> >   referenced,uptodate,lru,mappedtodisk
> >   fffffdfee0000080       4000020000 fffffe03fd441950       31  1 1002c
> >   referenced,uptodate,lru,mappedtodisk
> >   fffffdfee00000c0       4000030000 fffffe03fd441950       32  1 1002c
> >   referenced,uptodate,lru,mappedtodisk
> >   ...
> >   fffffdfee0ffff00       43fffc0000                0        0  1
> >   4000000000000400 reserved
> >   fffffdfee0ffff40       43fffd0000                0        0  1
> >   4000000000000400 reserved
> >   fffffdfee0ffff80       43fffe0000                0        0  1
> >   4000000000000400 reserved
> >   fffffdfee0ffffc0       43ffff0000                0        0  1
> >   4000000000000400 reserved
> >   crash>
> >   
> > And "kmem" alone recognizes it:
> >   
> >   crash> kmem fffffdfee0000000
> >         PAGE               PHYSICAL      MAPPING       INDEX CNT FLAGS
> >   fffffdfee0000000       4000000000 fffffe03fd441950       2f  1 1002c
> >   referenced,uptodate,lru,mappedtodisk
> >   crash>
> >   
> > But your calculated vmemmap_vaddr above doesn't seem right:
> >   
> >   crash> kmem fffffdff80000000
> >   kmem: WARNING: cannot make virtual-to-physical translation:
> >   fffffdff80000000
> >   fffffdff80000000: kernel virtual address not found in mem map
> >   crash>
> > 
> > Do your numbers match up on a dump that actually has CONFIG_RANDOMIZE_BASE?
> 
> In my environment, the numbers above match up on a dump whether
> CONFIG_RANDOMIZE_BASE is enabled or not.
> 
> Are you using a mainline kernel (final v4.6, not -rcX)?

Good point.  When I configured my arm64 test system, I installed the latest
Fedora kernel available at the time (4.6.0-0.rc7.git2.1.fc25), but it is based
upon linux-4.5.tar.xz.  I see now that there is a kernel-4.6.0-1.fc25 available
that is based upon linux-4.6.tar.xz.  I'll update the system.

 
> > Anyway, onto the patch...
> > 
> > > 
> > > changes in v2:
> > > * Fixed build warnings
> > > * Moved ARM64_NEW_VMEMMAP to machdep->flags
> > > * Show additional kaslr-related parameters in arm64_dump_machdep_table()
> > > * Handle a VMCOREINFO, "NUMBER(kimage_voffset)"
> > > 
> > > ===8<===
> > > >From 080a54ec232ac48ef2d8123cbedcf0f1fe27e391 Mon Sep 17 00:00:00 2001
> > > From: AKASHI Takahiro <takahiro.akashi at linaro.org>
> > > Date: Mon, 16 May 2016 17:31:55 +0900
> > > Subject: [PATCH v2] arm64: fix kernel memory map handling for
> > > kaslr-enabled
> > >  kernel
> > > 
> > > In kernel v4.6, Kernel ASLR (KASLR) is supported on arm64, and the start
> > > address of the kernel image can be randomized if CONFIG_RANDOMIZE_BASE is
> > > enabled.
> > > Even worse, the kernel image is no more mapped in the linear mapping, but
> > > in vmalloc area (i.e. below PAGE_OFFSET).
> > > 
> > > Now, according to the kernel's memory.h, converting a virtual address to
> > > a physical address should be done like below:
> > > 
> > > 	phys_addr_t __x = (phys_addr_t)(x);                             \
> > > 	__x & BIT(VA_BITS - 1) ? (__x & ~PAGE_OFFSET) + PHYS_OFFSET :   \
> > > 				 (__x - kimage_voffset); })
> > > 
> > > Please note that PHYS_OFFSET is no more equal to the start address of
> > > the first usable memory block in SYSTEM RAM due to the fact mentioned
> > > above.
> > > 
> > > This patch addresses this change and allows the crash utility to access
> > > memory contents with correct addresses.
> > > *However*, it is still rough-edged and we need more refinements.
> > > 
> > > 1-1) On a live system, crash with this patch won't work because
> > >    we currently have no way to know kimage_voffset.
> > > 
> > > 1-2) For a core dump file, we can do simply:
> > >        $ crash <vmlinux> <vmcore>
> > >    as long as the file has "NUMBER(kimage_voffset)"
> > >    (RELOC_AUTO|KASLR is automatically set.)
> > > 
> > >    I'm planning to add this enhancement in my next version of kexec/kdump
> > >    patch, i.e. v17.
> > > 
> > > 2) My current change breaks the backward compatibility.
> > >    Crash won't work with v4.5 or early kernel partly because I changed
> > >    the calculation of VA_BITS. (I think that the existing code is wrong.)
> > >    So far I don't have enough time to look into this issue as I'm
> > >    focusing
> > >    on KASLR support.
> > 
> > OK, but you still haven't shown how the current VA_BITS determination is
> > not a correct representation of CONFIG_ARM64_VA_BITS.
> 
> At least, the current VA_BITS doesn't work with my calculations, which
> were derived directly from v4.6's "asm/memory.h".

What are your kernel's PAGE_OFFSET value and CONFIG_ARM64_VA_BITS value?

> 
> > > 3) Backtracing a 'panic'ed task fails:
> > >  crash> bt
> > >  PID: 999    TASK: ffffffe960081800  CPU: 0   COMMAND: "sh"
> > >  bt: WARNING: corrupt prstatus? pstate=0x80000000, but no user frame
> > >  found
> > >  bt: WARNING: cannot determine starting stack frame for task
> > >  ffffffe960081800
> > > 
> > >   frame->pc indicates that it is a kernel address, so obviously something
> > >   is wrong. I'm diggin into it.
> > 
> > Can you add a debug statement that displays frame->pc, frame->sp, and
> > frame->fp?
> 
> I've already identified the cause.
> pt_regs->pstate was set mistakenly from "sprsr_el1" in panic().
> But I have a difficulty here to fix the problem as we have no direct way
> to retrieve a value of the *current* PSTATE.
 
Interesting.  I don't know what you mean by "sprsr_el1" in panic(), but
I just got a report from my github "issues", where the user injected a panic()
call into a kernel module and got the same error as above because of the pstate
value.  I asked him to put this hack into arm64.c:

--- a/arm64.c
+++ b/arm64.c
@@ -1711,11 +1711,14 @@ arm64_get_dumpfile_stackframe(struct bt_info *bt, struct arm64_stackframe *frame
                        error(WARNING,
                            "corrupt prstatus? pstate=0x%lx, but no user frame found\n",
                                ptregs->pstate);
+                       if (INSTACK(frame->sp, bt) && INSTACK(frame->fp, bt))
+                               goto try_kernel;
                        bt->flags |= BT_REGS_NOT_FOUND;
                        return FALSE;
                }
                bt->flags |= BT_USER_SPACE;
        } else {
+try_kernel:
                frame->sp = ptregs->sp;
                frame->fp = ptregs->regs[29];
        }

Can you give that a shot?

> > 
> > > Signed-off-by: AKASHI Takahiro <takahiro.akashi at linaro.org>
> > > ---
> > >  arm64.c   | 217
> > >  ++++++++++++++++++++++++++++++++++++++++++++++++--------------
> > >  defs.h    |  24 +++++--
> > >  main.c    |   7 +-
> > >  symbols.c |  10 +--
> > >  4 files changed, 196 insertions(+), 62 deletions(-)
> > > 
> > > diff --git a/arm64.c b/arm64.c
> > > index 34c8c59..6b97093 100644
> > > --- a/arm64.c
> > > +++ b/arm64.c
> > > @@ -72,6 +72,21 @@ static int arm64_get_crash_notes(void);
> > >  static void arm64_calc_VA_BITS(void);
> > >  static int arm64_is_uvaddr(ulong, struct task_context *);
> > >  
> > > +ulong
> > > +arm64_VTOP(ulong addr)
> > > +{
> > > +	if (!(machdep->flags & NEW_VMEMMAP) ||
> > > +            (addr >= machdep->machspec->page_offset)) {
> > > +		return machdep->machspec->phys_offset
> > > +			+ (addr - machdep->machspec->page_offset);
> > > +	} else {
> > > +		if (machdep->machspec->kimage_voffset)
> > > +			return addr - machdep->machspec->kimage_voffset;
> > > +		else /* no randomness */
> > > +			return machdep->machspec->phys_offset
> > > +				+ (addr - machdep->machspec->vmalloc_start_addr);
> > > +	}
> > > +}
> > >  
> > >  /*
> > >   * Do all necessary machine-specific setup here. This is called several
> > >   times
> > > @@ -81,6 +96,7 @@ void
> > >  arm64_init(int when)
> > >  {
> > >  	ulong value;
> > > +	char *string;
> > >  	struct machine_specific *ms;
> > >  
> > >  #if defined(__x86_64__)
> > > @@ -102,9 +118,34 @@ arm64_init(int when)
> > >  		if (machdep->cmdline_args[0])
> > >  			arm64_parse_cmdline_args();
> > >  		machdep->flags |= MACHDEP_BT_TEXT;
> > > +
> > > +		ms = machdep->machspec;
> > > +		if (!ms->kimage_voffset &&
> > > +		    (string = pc->read_vmcoreinfo("NUMBER(kimage_voffset)"))) {
> > > +			ms->kimage_voffset = htol(string, QUIET, NULL);
> > > +			free(string);
> > > +		}
> > 
> > It doesn't make sense to call pc->read_vmcoreinfo() if !DUMPFILE().
> 
> Do you think so?
> By default, pc->read_vmcoreinfo is set to no_vmcoreinfo.
> So it doesn't hurt anything.

Right, just a suggestion...
 
> > > +
> > > +		if (ms->kimage_voffset) {
> > > +			machdep->flags |= NEW_VMEMMAP;
> > > +
> > > +			/* if not specified in command line */
> > > +			if (!kt->relocate && !(kt->flags2 & (RELOC_AUTO|KASLR)))
> > > +				kt->flags2 |= (RELOC_AUTO|KASLR);
> > > +		}
> > > +
> > 
> > If CONFIG_RANDOMIZE_BASE is NOT set, should RELOC_AUTO|KASLR be set?  They
> > don't get set on my live system.
> 
> Not necessarily, but the current implementation of derive_kaslr_offset()
> just works even in this case.

Ahah -- cool!

> (This is the reason that I don't think we need a VMCOREINFO for
> CONFIG_RANDOMIZE_BASE.)

Well, as long as there's a way to avoid that unnecessary/confusing
warning message for non-KASLR new-vmemmap kernels.

I also wonder whether makedumpfile could use it?

> > >  		break;
> > >  
> > >  	case PRE_GDB:
> > > +		/* This check is somewhat redundant */
> > > +		if (kernel_symbol_exists("kimage_voffset")) {
> > > +			machdep->flags |= NEW_VMEMMAP;
> > > +
> > > +			if (!machdep->machspec->kimage_voffset)
> > > +				error(WARNING,
> > > +				"kimage_voffset not identified for kaslr kernel\n");
> > > +		}
> > > +
> > 
> > As mentioned in the general discussion above, the WARNING above makes
> > no sense if CONFIG_RANDOMIZE_BASE is not configured.
> 
> What about if I change the message "for v4.6 or later kernel"?
 
But again, I do not want the message at all if it's not applicable.  

Worse case, you could set a flag, and then if/when the crash session 
ultimately craps out if it can't figure things out, then the message
could be displayed post-mortem. 

> > I'm now thinking that CONFIG_RANDOMIZE_BASE (and maybe others like KASAN?)
> > should be passed in the dumpfile with VMCOREINFO_CONFIG().
> 
> Regarding KASAN, we only need to check for an existence of "kasan_init"
> symbol as you recommended before :)
 
Nice.

> > 
> > >  		if (!machdep->pagesize) {
> > >  			/*
> > >  			 * Kerneldoc Documentation/arm64/booting.txt describes
> > > @@ -160,16 +201,35 @@ arm64_init(int when)
> > >  		machdep->pagemask = ~((ulonglong)machdep->pageoffset);
> > >  
> > >  		arm64_calc_VA_BITS();
> > > -		machdep->machspec->page_offset = ARM64_PAGE_OFFSET;
> > > +		ms = machdep->machspec;
> > > +		ms->page_offset = ARM64_PAGE_OFFSET;
> > > +		/* FIXME: idmap for NEW_VMEMMAP */
> > >  		machdep->identity_map_base = ARM64_PAGE_OFFSET;
> > > -		machdep->machspec->userspace_top = ARM64_USERSPACE_TOP;
> > > -		machdep->machspec->modules_vaddr = ARM64_MODULES_VADDR;
> > > -		machdep->machspec->modules_end = ARM64_MODULES_END;
> > > -		machdep->machspec->vmalloc_start_addr = ARM64_VMALLOC_START;
> > > -		machdep->machspec->vmalloc_end = ARM64_VMALLOC_END;
> > > -		machdep->kvbase = ARM64_VMALLOC_START;
> > > -		machdep->machspec->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> > > -		machdep->machspec->vmemmap_end = ARM64_VMEMMAP_END;
> > > +		machdep->kvbase = ARM64_VA_START;
> > > +		ms->userspace_top = ARM64_USERSPACE_TOP;
> > > +		if (machdep->flags & NEW_VMEMMAP) {
> > > +			struct syment *sp;
> > > +
> > > +			sp = kernel_symbol_search("_text");
> > > +			ms->kimage_text = (sp ? sp->value : 0);
> > > +			sp = kernel_symbol_search("_end");
> > > +			ms->kimage_end = (sp ? sp->value : 0);
> > > +
> > > +			ms->modules_vaddr = ARM64_VA_START;
> > > +			if (kernel_symbol_exists("kasan_init"))
> > > +				ms->modules_vaddr += ARM64_KASAN_SHADOW_SIZE;
> > > +			ms->modules_end = ms->modules_vaddr
> > > +						+ ARM64_MODULES_VSIZE -1;
> > > +
> > > +			ms->vmalloc_start_addr = ms->modules_end + 1;
> > > +		} else {
> > > +			ms->modules_vaddr = ARM64_PAGE_OFFSET - MEGABYTES(64);
> > > +			ms->modules_end = ARM64_PAGE_OFFSET - 1;
> > > +			ms->vmalloc_start_addr = ARM64_VA_START;
> > > +		}
> > > +		ms->vmalloc_end = ARM64_VMALLOC_END;
> > > +		ms->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> > > +		ms->vmemmap_end = ARM64_VMEMMAP_END;
> > >  
> > >  		switch (machdep->pagesize)
> > >  		{
> > > @@ -232,8 +292,6 @@ arm64_init(int when)
> > >  		machdep->stacksize = ARM64_STACK_SIZE;
> > >  		machdep->flags |= VMEMMAP;
> > >  
> > > -		arm64_calc_phys_offset();
> > > -
> > >  		machdep->uvtop = arm64_uvtop;
> > >  		machdep->kvtop = arm64_kvtop;
> > >  		machdep->is_kvaddr = generic_is_kvaddr;
> > > @@ -262,6 +320,10 @@ arm64_init(int when)
> > >  		machdep->dumpfile_init = NULL;
> > >  		machdep->verify_line_number = NULL;
> > >  		machdep->init_kernel_pgd = arm64_init_kernel_pgd;
> > > +
> > > +		/* use machdep parameters */
> > > +		arm64_calc_phys_offset();
> > > +
> > 
> > This doesn't make any sense to me.  I understand you've done it because
> > you are now doing a readmem() in arm64_calc_phys_offset().  But readmem()
> > calls are never done this early.  And in fact, on the live system the
> > readmem() in arm64_calc_phys_offset() fails like so (running "crash -d4"):
> >  
> >   <readmem: fffffc0008d70bb0, KVADDR, "memstart_addr", 8, (ROE|Q),
> >   3ffcc2842c0>
> >   <read_memory_device: addr: fffffc0008d70bb0 paddr: d70bb0 cnt: 8>
> >   crash: read error: kernel virtual address: fffffc0008d70bb0  type:
> >   "memstart_addr"
> > 
> > It fails above because it needs the phys_base in order to do the readmem().
> > During
> > runtime it has it:
> 
> This is because we don't know kimage_voffset on a live system.
> (and so I mentioned this issue in my commit log.)

Right -- which is a good reason for NOT using readmem() at that point
in time.  I can also envision a scenario where readmem() may inadvertently
"work" (i.e. depending upon where phys_base really is), and then return 
something bogus from an unintended location.
 
> As far as kdump case is concerned, readmem() at this point just works
> for any address in kernel image. This is my intentional use of this function.

I understand, but you're subverting the subsequent dumpfile-specific
calls later in the function.  It is precisely *their* responsibility
to return the value of phys_offset.
 
> >   crash> set debug 4
> >   debug: 4
> >   crash> rd memstart_addr
> >   <addr: fffffc0008d70bb0 count: 1 flag: 490 (KVADDR)>
> >   <readmem: fffffc0008d70bb0, KVADDR, "64-bit KVADDR", 8, (FOE),
> >   3ffcbfbfc28>
> >   <read_memory_device: addr: fffffc0008d70bb0 paddr: 4000d70bb0 cnt: 8>
> >   fffffc0008d70bb0:  0000004000000000                    .... at ...
> >   crash>
> > 
> > So without kimage_voffset, that readmem(), how would that readmem()
> > possibly work?
> > You need the *contents* of memstart_addr in order to *read* memstart_addr.
> > 
> > Now, if it kimage_voffset were passed into the vmcore, then I guess you
> > could
> > read it.  But why do it that way?  That's what diskdump_get_phys_base() and
> > kdump_get_phys_base() are supposed to do.
> 
> Because I believe that this is the *common* place to do it either for
> diskdump, or kdump ( or even a live system if we know kimage_voffset.)

Well, as I have shown, it won't work on my live system because of the
chicken-and-egg scenario.  Let's worry about live system analysis after
we get kdump straightened out.

 
> > 
> > >  		break;
> > >  
> > >  	case POST_GDB:
> > > @@ -409,6 +471,8 @@ arm64_dump_machdep_table(ulong arg)
> > >  		fprintf(fp, "%sIRQ_STACKS", others++ ? "|" : "");
> > >  	if (machdep->flags & MACHDEP_BT_TEXT)
> > >  		fprintf(fp, "%sMACHDEP_BT_TEXT", others++ ? "|" : "");
> > > +	if (machdep->flags & NEW_VMEMMAP)
> > > +		fprintf(fp, "%sNEW_VMEMMAP", others++ ? "|" : "");
> > >  	fprintf(fp, ")\n");
> > >  
> > >  	fprintf(fp, "              kvbase: %lx\n", machdep->kvbase);
> > > @@ -494,6 +558,8 @@ arm64_dump_machdep_table(ulong arg)
> > >  	ms = machdep->machspec;
> > >  
> > >  	fprintf(fp, "            machspec: %lx\n", (ulong)ms);
> > > +	fprintf(fp, "     memory map layout: %s\n",
> > > +				(machdep->flags & NEW_VMEMMAP) ? "new" : "old");
> > 
> > As before, these internal "help" structure dumps just show existing
> > structure fields,
> > and NEW_VMMEMAP is already shown in the flags.
> 
> I put it as useful information for those people who might have got
> any concerns about those odd numbers/kernel layout.
> But if you don't like it, I will remove it.
> 
> > 
> > >  	fprintf(fp, "               VA_BITS: %ld\n", ms->VA_BITS);
> > >  	fprintf(fp, "         userspace_top: %016lx\n", ms->userspace_top);
> > >  	fprintf(fp, "           page_offset: %016lx\n", ms->page_offset);
> > > @@ -503,6 +569,11 @@ arm64_dump_machdep_table(ulong arg)
> > >  	fprintf(fp, "           modules_end: %016lx\n", ms->modules_end);
> > >  	fprintf(fp, "         vmemmap_vaddr: %016lx\n", ms->vmemmap_vaddr);
> > >  	fprintf(fp, "           vmemmap_end: %016lx\n", ms->vmemmap_end);
> > > +	if (machdep->flags & NEW_VMEMMAP) {
> > > +		fprintf(fp, "           kimage_text: %016lx\n", ms->kimage_text);
> > > +		fprintf(fp, "            kimage_end: %016lx\n", ms->kimage_end);
> > > +		fprintf(fp, "        kimage_voffset: %016lx\n", ms->kimage_voffset);
> > > +	}
> > >  	fprintf(fp, "           phys_offset: %lx\n", ms->phys_offset);
> > >  	fprintf(fp, "__exception_text_start: %lx\n",
> > >  	ms->__exception_text_start);
> > >  	fprintf(fp, "  __exception_text_end: %lx\n", ms->__exception_text_end);
> > > @@ -543,6 +614,42 @@ arm64_dump_machdep_table(ulong arg)
> > >  	}
> > >  }
> > >  
> > > +static int
> > > +arm64_parse_machdep_arg_l(char *argstring, char *param, ulong *value)
> > > +{
> > > +	int len;
> > > +	int megabytes = FALSE;
> > > +	char *p;
> > > +
> > > +	len = strlen(param);
> > > +	if (!STRNEQ(argstring, param) || (argstring[len] != '='))
> > > +		return FALSE;
> > > +
> > > +	if ((LASTCHAR(argstring) == 'm') ||
> > > +	    (LASTCHAR(argstring) == 'M')) {
> > > +		LASTCHAR(argstring) = NULLCHAR;
> > > +		megabytes = TRUE;
> > > +	}
> > > +
> > > +	p = argstring + len + 1;
> > > +	if (strlen(p)) {
> > > +		int flags = RETURN_ON_ERROR | QUIET;
> > > +		int err = 0;
> > > +
> > > +		if (megabytes) {
> > > +			*value = dtol(p, flags, &err);
> > > +			if (!err)
> > > +				*value = MEGABYTES(*value);
> > > +		} else {
> > > +			*value = htol(p, flags, &err);
> > > +		}
> > > +
> > > +		if (!err)
> > > +			return TRUE;
> > > +	}
> > > +
> > > +	return FALSE;
> > > +}
> > >  
> > >  /*
> > >   * Parse machine dependent command line arguments.
> > > @@ -554,11 +661,10 @@ arm64_dump_machdep_table(ulong arg)
> > >  static void
> > >  arm64_parse_cmdline_args(void)
> > >  {
> > > -	int index, i, c, err;
> > > +	int index, i, c;
> > >  	char *arglist[MAXARGS];
> > >  	char buf[BUFSIZE];
> > >  	char *p;
> > > -	ulong value = 0;
> > >  
> > >  	for (index = 0; index < MAX_MACHDEP_ARGS; index++) {
> > >  		if (!machdep->cmdline_args[index])
> > > @@ -580,39 +686,23 @@ arm64_parse_cmdline_args(void)
> > >  		c = parse_line(buf, arglist);
> > >  
> > >  		for (i = 0; i < c; i++) {
> > > -			err = 0;
> > > -
> > > -			if (STRNEQ(arglist[i], "phys_offset=")) {
> > > -				int megabytes = FALSE;
> > > -				int flags = RETURN_ON_ERROR | QUIET;
> > > -
> > > -				if ((LASTCHAR(arglist[i]) == 'm') ||
> > > -				    (LASTCHAR(arglist[i]) == 'M')) {
> > > -					LASTCHAR(arglist[i]) = NULLCHAR;
> > > -					megabytes = TRUE;
> > > -				}
> > > -
> > > -				p = arglist[i] + strlen("phys_offset=");
> > > -				if (strlen(p)) {
> > > -					if (megabytes)
> > > -						value = dtol(p, flags, &err);
> > > -					else
> > > -						value = htol(p, flags, &err);
> > > -				}
> > > -
> > > -				if (!err) {
> > > -					if (megabytes)
> > > -						value = MEGABYTES(value);
> > > -
> > > -					machdep->machspec->phys_offset = value;
> > > -
> > > -					error(NOTE,
> > > -					    "setting phys_offset to: 0x%lx\n\n",
> > > -						machdep->machspec->phys_offset);
> > > +			if (arm64_parse_machdep_arg_l(arglist[i],
> > > +					"phys_offset",
> > > +					&machdep->machspec->phys_offset)) {
> > > +				error(NOTE,
> > > +					"setting phys_offset to: 0x%lx\n\n",
> > > +					machdep->machspec->phys_offset);
> > > +
> > > +				machdep->flags |= PHYS_OFFSET;
> > > +				continue;
> > > +			} else if (arm64_parse_machdep_arg_l(arglist[i],
> > > +					"kimage_voffset",
> > > +					&machdep->machspec->kimage_voffset)) {
> > > +				error(NOTE,
> > > +					"setting kimage_voffset to: 0x%lx\n\n",
> > > +					machdep->machspec->kimage_voffset);
> > >  
> > > -					machdep->flags |= PHYS_OFFSET;
> > > -					continue;
> > > -				}
> > > +				continue;
> > >  			}
> > >  
> > >  			error(WARNING, "ignoring --machdep option: %s\n",
> > > @@ -627,10 +717,20 @@ arm64_calc_phys_offset(void)
> > >  {
> > >  	struct machine_specific *ms = machdep->machspec;
> > >  	ulong phys_offset;
> > > +	struct syment *sp;
> > > +	ulong value;
> > >  
> > >  	if (machdep->flags & PHYS_OFFSET) /* --machdep override */
> > >  		return;
> > >  
> > > +	sp = kernel_symbol_search("memstart_addr");
> > > +	if (sp && readmem(sp->value, KVADDR, (char *)&value, sizeof(value),
> > > +				"memstart_addr", QUIET|RETURN_ON_ERROR)) {
> > > +		ms->phys_offset = value;
> > > +		machdep->flags |= PHYS_OFFSET;
> > > +		return;
> > > +	}
> > > +
> > 
> > Also, I may be missing something, but the PHYS_OFFSET flag means that it
> > was
> > passed in via --machdep, and is only used in this function to avoid any
> > further
> > attempts to figure out what it is.  Here you are setting the flag for no
> > particular
> > reason that I can see.  Why?
> 
> Well, I don't remember very well. If not necessary, I will remove it.

OK, I guarantee it's useless in this case...
 
> > But again, doing a readmem() call this early is setting a bad precedent --
> > they have
> > *always* been delayed until after gdb_session_init() in main_loop().  And
> > the work
> > of determining phys_base should be done by the dumpfile call-out functions
> > that
> > already exist.
> >  
> > 
> > >  	/*
> > >  	 * Next determine suitable value for phys_offset. User can override
> > >  	 this
> > >  	 * by passing valid '--machdep phys_offset=<addr>' option.
> > > @@ -2377,6 +2477,11 @@ arm64_IS_VMALLOC_ADDR(ulong vaddr)
> > >  {
> > >  	struct machine_specific *ms = machdep->machspec;
> > >  	
> > > +	if ((machdep->flags & NEW_VMEMMAP) &&
> > > +	    (vaddr >= machdep->machspec->kimage_text) &&
> > > +	    (vaddr <= machdep->machspec->kimage_end))
> > > +		return FALSE;
> > > +
> > >          return ((vaddr >= ms->vmalloc_start_addr && vaddr <=
> > >          ms->vmalloc_end) ||
> > >                  ((machdep->flags & VMEMMAP) &&
> > >                   (vaddr >= ms->vmemmap_vaddr && vaddr <=
> > >                   ms->vmemmap_end))
> > >                   ||
> > > @@ -2407,7 +2512,11 @@ arm64_calc_VA_BITS(void)
> > >  
> > >  	for (bitval = highest_bit_long(value); bitval; bitval--) {
> > >  		if ((value & (1UL << bitval)) == 0) {
> > > +#if 1
> > > +			machdep->machspec->VA_BITS = bitval + 1;
> > > +#else /* FIXME: bug? */
> > >  			machdep->machspec->VA_BITS = bitval + 2;
> > > +#endif
> > >  			break;
> > >  		}
> > >  	}
> > 
> > Again, a non-starter until the above is resolved -- but you know that
> > already...
> 
> Just a reminder for now.
> 
> > 
> > > @@ -2459,10 +2568,22 @@ arm64_calc_virtual_memory_ranges(void)
> > >  		break;
> > >          }
> > >  
> > > -	vmemmap_size = ALIGN((1UL << (ms->VA_BITS - machdep->pageshift)) *
> > > SIZE(page), PUD_SIZE);
> > > +	if (machdep->flags & NEW_VMEMMAP)
> > > +#define STRUCT_PAGE_MAX_SHIFT   6
> > > +		vmemmap_size = 1UL << (ms->VA_BITS - machdep->pageshift - 1
> > > +						+ STRUCT_PAGE_MAX_SHIFT);
> > > +	else
> > > +		vmemmap_size = ALIGN((1UL << (ms->VA_BITS - machdep->pageshift)) *
> > > SIZE(page), PUD_SIZE);
> > > +
> > >  	vmalloc_end = (ms->page_offset - PUD_SIZE - vmemmap_size - SZ_64K);
> > > -	vmemmap_start = vmalloc_end + SZ_64K;
> > > -	vmemmap_end = vmemmap_start + vmemmap_size;
> > > +
> > > +	if (machdep->flags & NEW_VMEMMAP) {
> > > +		vmemmap_start = ms->page_offset - vmemmap_size;
> > > +		vmemmap_end = ms->page_offset;
> > > +	} else {
> > > +		vmemmap_start = vmalloc_end + SZ_64K;
> > > +		vmemmap_end = vmemmap_start + vmemmap_size;
> > > +	}
> > >  
> > >  	ms->vmalloc_end = vmalloc_end - 1;
> > >  	ms->vmemmap_vaddr = vmemmap_start;
> > > diff --git a/defs.h b/defs.h
> > > index a09fa9a..2bbb55b 100644
> > > --- a/defs.h
> > > +++ b/defs.h
> > > @@ -2844,8 +2844,8 @@ typedef u64 pte_t;
> > >  
> > >  #define PTOV(X) \
> > >  	((unsigned
> > >  	long)(X)-(machdep->machspec->phys_offset)+(machdep->machspec->page_offset))
> > > -#define VTOP(X) \
> > > -	((unsigned
> > > long)(X)-(machdep->machspec->page_offset)+(machdep->machspec->phys_offset))
> > > +
> > > +#define VTOP(X)               arm64_VTOP((ulong)(X))
> > >  
> > >  #define USERSPACE_TOP   (machdep->machspec->userspace_top)
> > >  #define PAGE_OFFSET     (machdep->machspec->page_offset)
> > > @@ -2938,18 +2938,23 @@ typedef signed int s32;
> > >  #define VM_L3_4K      (0x10)
> > >  #define KDUMP_ENABLED (0x20)
> > >  #define IRQ_STACKS    (0x40)
> > > +#define NEW_VMEMMAP   (0x80)
> > >  
> > >  /*
> > >   * sources: Documentation/arm64/memory.txt
> > >   *          arch/arm64/include/asm/memory.h
> > >   *          arch/arm64/include/asm/pgtable.h
> > >   */
> > > -
> > > -#define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) <<
> > > (machdep->machspec->VA_BITS - 1))
> > > +#define ARM64_VA_START       ((0xffffffffffffffffUL) \
> > > +					<< machdep->machspec->VA_BITS)
> > > +#define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) \
> > > +					<< (machdep->machspec->VA_BITS - 1))
> > >  #define ARM64_USERSPACE_TOP  ((1UL) << machdep->machspec->VA_BITS)
> > > -#define ARM64_MODULES_VADDR  (ARM64_PAGE_OFFSET - MEGABYTES(64))
> > > -#define ARM64_MODULES_END    (ARM64_PAGE_OFFSET - 1)
> > > -#define ARM64_VMALLOC_START  ((0xffffffffffffffffUL) <<
> > > machdep->machspec->VA_BITS)
> > > +
> > > +/* only used for v4.6 or later */
> > > +#define ARM64_MODULES_VSIZE     MEGABYTES(128)
> > > +#define ARM64_KASAN_SHADOW_SIZE (1UL << (machdep->machspec->VA_BITS -
> > > 3))
> > > +
> > >  /*
> > >   * The following 3 definitions are the original values, but are obsolete
> > >   * for 3.17 and later kernels because they are now build-time
> > >   calculations.
> > > @@ -3028,6 +3033,10 @@ struct machine_specific {
> > >  	ulong kernel_flags;
> > >  	ulong irq_stack_size;
> > >  	ulong *irq_stacks;
> > > +	/* only needed for kaslr-enabled kernel */
> > > +	ulong kimage_voffset;
> > > +	ulong kimage_text;
> > > +	ulong kimage_end;
> > >  };
> > 
> > The comment is wrong -- the kimage_text and kimage_end values are needed
> > for
> > NEW_VMEMMAP kernels that are NOT configured with CONFIG_RANDOMIZE_BASE.
> > Unless you consider my live system "kaslr-enabled"?
> 
> I will change it to "for v4.6 or later".
> 
> I have limited (or no) time to investigate a live-system case, and so if
> you have any suggestions or your own patch, it is very much appreciated.
 
Yep...  but I will build upon your solid foundation.

If I see anything interesting/different after my live system upgrade, I'll let 
you know.

Thanks,
  Dave




More information about the Crash-utility mailing list