[Crash-utility] [PATCH v1] arm64: fix kernel memory map handling for kaslr-enabled

Dave Anderson anderson at redhat.com
Fri May 20 19:06:39 UTC 2016



Hi Takahiro,

Welcome to the mailing list -- you are a most valuable addition.

To others in the list, Takahiro and I have been communicating offline
for a couple weeks, and I convinced him to join us.  He works on both
kexec-tools and the crash utility for Linaro on the ARM64 architecture.
If you are unaware, in Linux 4.6 there was a major change in the ARM64 
virtual memory layout, complicated by the layering of KASLR on top of it. 
The new VM has broken crash utility support completely, and Takahiro is 
tackling both.

My comments and questions on the v1 patch follow...

----- Original Message -----
> Hi,
> 
> This patch is still rough-edged, but please review it and
> any comments are very welcome.
> I will try to fix the known issues before I submit a new
> version of kexec/kdump patch for v4.7 merge window.
> 
> Thanks,
> -Takahiro AKASHI
> 
> ===8<===
> >From fdc7c881d98ef00ed1ff38a058b4913a1d5bcda6 Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi at linaro.org>
> Date: Mon, 16 May 2016 17:31:55 +0900
> Subject: [PATCH v1] arm64: fix kernel memory map handling for kaslr-enabled
>  kernel
> 
> In kernel v4.6, Kernel ASLR (KASLR) is supported on arm64, and the start
> address of the kernel image can be randomized if CONFIG_RANDOMIZE_BASE is enabled.
> Even worse, the kernel image is no more mapped in the linear mapping, but
> in vmalloc area (i.e. below PAGE_OFFSET).
> 
> Now, according to the kernel's memory.h, converting a virtual address to
> a physical address should be done like below:
> 
> 	phys_addr_t __x = (phys_addr_t)(x);                             \
> 	__x & BIT(VA_BITS - 1) ? (__x & ~PAGE_OFFSET) + PHYS_OFFSET :   \
> 				 (__x - kimage_voffset); })
> 
> Please note that PHYS_OFFSET is no more equal to the start address of
> the first usable memory block in SYSTEM RAM due to the fact mentioned
> above.

So it is no longer possible to use /proc/iomem if KASLR is enabled
on a live system?  That being the case, we need a way for root to 
be able to determine what it is for live system analysis.

> 
> This patch addresses this change and allows the crash utility to access
> memory contents with correct addresses.
> *However*, it is still rough-edged and we need more refinements.
> 
> 1) Currently we have to specify the following command line:
>    $ crash --kaslr auto
>            --machdep phys_offset=XXX
>            --machdep kimage_voffset=YYY
>            <vmlinux> <vmcore>

I presume that live system analysis requires the same extra arguments at this point?


>    To remove these extra options, I'm planning to enhance my kdump patch
>    so that /proc/vmcore holds the necessary information for analyzing the
>    contents of kernel memory.
>
> 2) My current change breaks the backward compatibility.
>    Crash won't work with v4.5 or early kernel partly because I changed
>    the calculation of VA_BITS. (I think that the existing code is wrong.)
>    So far I don't have enough time to look into this issue as I'm focusing
>    on KASLR support.

I don't understand why you think the existing code is wrong.  It is
simply trying to determine what the value of CONFIG_ARM64_VA_BITS is.
For example, we (Red Hat) are currently using 4.5.0 based kernel, which
has this configuration:

  config-arm64:CONFIG_ARM64_VA_BITS=48

And that's what crash-7.1.5 calculates:

  crash> sys | grep RELEASE
       RELEASE: 4.5.0-0.38.el7.aarch64
  crash> help -m
  ... [ cut ] ...
                 VA_BITS: 48
           userspace_top: 0001000000000000
             page_offset: ffff800000000000
      vmalloc_start_addr: ffff000000000000
             vmalloc_end: ffff7fbfdffeffff
           modules_vaddr: ffff7ffffc000000
             modules_end: ffff7fffffffffff
           vmemmap_vaddr: ffff7fbfe0000000
             vmemmap_end: ffff7fffdfffffff
             phys_offset: 4000000000
  ...

I'm also presuming that your VA_BITS-related changes is what breaks
backwards-compatibility, because if I run "crash -d1" with your patch
applied, I see this debug output on the same kernel above:

VA_BITS: 47

On the other hand, if I run your patch on a Fedora 4.6.0-0.rc7.git2.1.fc25
live kernel, it comes up OK, and shows this:

               VA_BITS: 42
         userspace_top: 0000040000000000
           page_offset: fffffe0000000000
    vmalloc_start_addr: fffffc0008000000
           vmalloc_end: fffffdff5ffeffff
         modules_vaddr: fffffc0000000000
           modules_end: fffffc0007ffffff
         vmemmap_vaddr: fffffdff80000000
           vmemmap_end: fffffdffffffffff
           phys_offset: 4000000000

KASLR is NOT configured, so perhaps that figures into why it works?

 
> 3) Backtracing a 'panic'ed task fails:
>  crash> bt
>  PID: 999    TASK: ffffffe960081800  CPU: 0   COMMAND: "sh"
>  bt: WARNING: corrupt prstatus? pstate=0x80000000, but no user frame found
>  bt: WARNING: cannot determine starting stack frame for task ffffffe960081800
> 
>   frame->pc indicates that it is a kernel address, so obviously something
>   is wrong. I'm diggin into it.
 
OK...

First, before posting, please kick off a "make warn" compilation in order to clean up
things like this:

$ make warn
... [ cut ] ...
cc -c -g -DARM64 -DLZO -DSNAPPY -DGDB_7_6  arm64.c -Wall -O2 -Wstrict-prototypes -Wmissing-prototypes -fstack-protector -Wformat-security
arm64.c:78:1: warning: no previous prototype for 'arm64_VTOP' [-Wmissing-prototypes]
 arm64_VTOP(ulong addr)
 ^
arm64.c: In function 'arm64_parse_cmdline_args':
arm64.c:635:8: warning: unused variable 'value' [-Wunused-variable]
  ulong value = 0;
        ^
arm64.c:631:19: warning: unused variable 'err' [-Wunused-variable]
  int index, i, c, err;
                   ^
arm64.c: In function 'arm64_init':
arm64.c:2539:17: warning: 'vmalloc_end' may be used uninitialized in this function [-Wmaybe-uninitialized]
   vmemmap_start = vmalloc_end + SZ_64K;
                 ^
arm64.c:2510:8: note: 'vmalloc_end' was declared here
  ulong vmalloc_end;
...

As I mentioned in one of our earlier conversations, "vmalloc_end" is most definitely
being used uninitialized.

Patch comments are intermingled below:


> Signed-off-by: AKASHI Takahiro <takahiro.akashi at linaro.org>
> ---
>  arm64.c   | 165
>  +++++++++++++++++++++++++++++++++++++++++++++-----------------
>  defs.h    |  22 ++++++---
>  main.c    |   7 +--
>  symbols.c |  10 ++--
>  4 files changed, 146 insertions(+), 58 deletions(-)
> 
> diff --git a/arm64.c b/arm64.c
> index 34c8c59..22ddade 100644
> --- a/arm64.c
> +++ b/arm64.c
> @@ -22,6 +22,8 @@
>  #include <endian.h>
>  
>  #define NOT_IMPLEMENTED(X) error((X), "%s: function not implemented\n",
>  __func__)
> +/* FIXME: temporarily used in kt->flags2 */
> +#define ARM64_NEW_VMEMMAP 0x80000000

Since ARM64_NEW_VMEMMAP is only used in arm64.c, please put it
in the machdep->flags field (under IRQ_STACKS):


>  
>  static struct machine_specific arm64_machine_specific = { 0 };
>  static int arm64_verify_symbol(const char *, ulong, char);
> @@ -72,6 +74,21 @@ static int arm64_get_crash_notes(void);
>  static void arm64_calc_VA_BITS(void);
>  static int arm64_is_uvaddr(ulong, struct task_context *);
>  
> +ulong
> +arm64_VTOP(ulong addr)
> +{
> +	if (!(kt->flags2 & ARM64_NEW_VMEMMAP) ||
> +            (addr >= machdep->machspec->page_offset)) {
> +		return machdep->machspec->phys_offset
> +			+ (addr - machdep->machspec->page_offset);
> +	} else {
> +		if (machdep->machspec->kimage_voffset)
> +			return addr - machdep->machspec->kimage_voffset;
> +		else /* no randomness */
> +			return machdep->machspec->phys_offset
> +				+ (addr - machdep->machspec->vmalloc_start_addr);
> +	}
> +}
>  
>  /*
>   * Do all necessary machine-specific setup here. This is called several
>   times
> @@ -105,6 +122,10 @@ arm64_init(int when)
>  		break;
>  
>  	case PRE_GDB:
> +		/* FIXME: Is kimage_voffset appropriate? */
> +		if (kernel_symbol_exists("kimage_voffset"))
> +			kt->flags2 |= ARM64_NEW_VMEMMAP;
> +

As far as I can tell, "kvimage_voffset" is perfect.  


>  		if (!machdep->pagesize) {
>  			/*
>  			 * Kerneldoc Documentation/arm64/booting.txt describes
> @@ -160,16 +181,33 @@ arm64_init(int when)
>  		machdep->pagemask = ~((ulonglong)machdep->pageoffset);
>  
>  		arm64_calc_VA_BITS();
> -		machdep->machspec->page_offset = ARM64_PAGE_OFFSET;
> +		ms = machdep->machspec;
> +		ms->page_offset = ARM64_PAGE_OFFSET;
>  		machdep->identity_map_base = ARM64_PAGE_OFFSET;
> -		machdep->machspec->userspace_top = ARM64_USERSPACE_TOP;
> -		machdep->machspec->modules_vaddr = ARM64_MODULES_VADDR;
> -		machdep->machspec->modules_end = ARM64_MODULES_END;
> -		machdep->machspec->vmalloc_start_addr = ARM64_VMALLOC_START;
> -		machdep->machspec->vmalloc_end = ARM64_VMALLOC_END;
> -		machdep->kvbase = ARM64_VMALLOC_START;
> -		machdep->machspec->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> -		machdep->machspec->vmemmap_end = ARM64_VMEMMAP_END;
> +		ms->userspace_top = ARM64_USERSPACE_TOP;
> +		if (kt->flags2 & ARM64_NEW_VMEMMAP) {
> +			struct syment *sp;
> +
> +			sp = kernel_symbol_search("_text");
> +			ms->kimage_text = (sp ? sp->value : 0);
> +			sp = kernel_symbol_search("_end");
> +			ms->kimage_end = (sp ? sp->value : 0);
> +
> +			ms->modules_vaddr = ARM64_VA_START;
> +			if (kernel_symbol_exists("kasan_init"))
> +				ms->modules_vaddr += ARM64_KASAN_SHADOW_SIZE;
> +			ms->modules_end = ms->modules_vaddr
> +						+ ARM64_MODULES_VSIZE -1;
> +
> +			ms->vmalloc_start_addr = ms->modules_end + 1;
> +		} else {
> +			ms->modules_vaddr = ARM64_PAGE_OFFSET - MEGABYTES(64);
> +			ms->modules_end = ARM64_PAGE_OFFSET - 1;
> +			ms->vmalloc_start_addr = ARM64_VA_START;
> +		}
> +		ms->vmalloc_end = ARM64_VMALLOC_END;
> +		ms->vmemmap_vaddr = ARM64_VMEMMAP_VADDR;
> +		ms->vmemmap_end = ARM64_VMEMMAP_END;
>  
>  		switch (machdep->pagesize)
>  		{
> @@ -543,6 +581,42 @@ arm64_dump_machdep_table(ulong arg)
>  	}
>  }

Although you haven't changed arm64_dump_machdep_table(), this reminds
me: can you please display the 3 new machine_specific fields in 
arm64_dump_machdep_table().  And also the ARM64_NEW_VMEMMAP bit after
you move it into the flags field?  ("help -m" is your friend...)  


> +static int
> +arm64_parse_machdep_arg_l(char *argstring, char *param, ulong *value)
> +{
> +	int len;
> +	int megabytes = FALSE;
> +	char *p;
> +
> +	len = strlen(param);
> +	if (!STRNEQ(argstring, param) || (argstring[len] != '='))
> +		return FALSE;
> +
> +	if ((LASTCHAR(argstring) == 'm') ||
> +	    (LASTCHAR(argstring) == 'M')) {
> +		LASTCHAR(argstring) = NULLCHAR;
> +		megabytes = TRUE;
> +	}
> +
> +	p = argstring + len + 1;
> +	if (strlen(p)) {
> +		int flags = RETURN_ON_ERROR | QUIET;
> +		int err = 0;
> +
> +		if (megabytes) {
> +			*value = dtol(p, flags, &err);
> +			if (!err)
> +				*value = MEGABYTES(*value);
> +		} else {
> +			*value = htol(p, flags, &err);
> +		}
> +
> +		if (!err)
> +			return TRUE;
> +	}
> +
> +	return FALSE;
> +}
>  
>  /*
>   * Parse machine dependent command line arguments.
> @@ -580,39 +654,23 @@ arm64_parse_cmdline_args(void)
>  		c = parse_line(buf, arglist);
>  
>  		for (i = 0; i < c; i++) {
> -			err = 0;
> -
> -			if (STRNEQ(arglist[i], "phys_offset=")) {
> -				int megabytes = FALSE;
> -				int flags = RETURN_ON_ERROR | QUIET;
> -
> -				if ((LASTCHAR(arglist[i]) == 'm') ||
> -				    (LASTCHAR(arglist[i]) == 'M')) {
> -					LASTCHAR(arglist[i]) = NULLCHAR;
> -					megabytes = TRUE;
> -				}
> -
> -				p = arglist[i] + strlen("phys_offset=");
> -				if (strlen(p)) {
> -					if (megabytes)
> -						value = dtol(p, flags, &err);
> -					else
> -						value = htol(p, flags, &err);
> -				}
> -
> -				if (!err) {
> -					if (megabytes)
> -						value = MEGABYTES(value);
> -
> -					machdep->machspec->phys_offset = value;
> -
> -					error(NOTE,
> -					    "setting phys_offset to: 0x%lx\n\n",
> -						machdep->machspec->phys_offset);
> +			if (arm64_parse_machdep_arg_l(arglist[i],
> +					"phys_offset",
> +					&machdep->machspec->phys_offset)) {
> +				error(NOTE,
> +					"setting phys_offset to: 0x%lx\n\n",
> +					machdep->machspec->phys_offset);
> +
> +				machdep->flags |= PHYS_OFFSET;
> +				continue;
> +			} else if (arm64_parse_machdep_arg_l(arglist[i],
> +					"kimage_voffset",
> +					&machdep->machspec->kimage_voffset)) {
> +				error(NOTE,
> +					"setting kimage_voffset to: 0x%lx\n\n",
> +					machdep->machspec->kimage_voffset);
>  
> -					machdep->flags |= PHYS_OFFSET;
> -					continue;
> -				}
> +				continue;
>  			}
>  
>  			error(WARNING, "ignoring --machdep option: %s\n",
> @@ -2377,6 +2435,11 @@ arm64_IS_VMALLOC_ADDR(ulong vaddr)
>  {
>  	struct machine_specific *ms = machdep->machspec;
>  	
> +	if ((kt->flags2 & ARM64_NEW_VMEMMAP) &&
> +	    (vaddr >= machdep->machspec->kimage_text) &&
> +	    (vaddr <= machdep->machspec->kimage_end))
> +		return FALSE;
> +
>          return ((vaddr >= ms->vmalloc_start_addr && vaddr <=
>          ms->vmalloc_end) ||
>                  ((machdep->flags & VMEMMAP) &&
>                   (vaddr >= ms->vmemmap_vaddr && vaddr <= ms->vmemmap_end))
>                   ||
> @@ -2407,7 +2470,11 @@ arm64_calc_VA_BITS(void)
>  
>  	for (bitval = highest_bit_long(value); bitval; bitval--) {
>  		if ((value & (1UL << bitval)) == 0) {
> +#if 1
> +			machdep->machspec->VA_BITS = bitval + 1;
> +#else /* FIXME: bug? */
>  			machdep->machspec->VA_BITS = bitval + 2;
> +#endif

Yeah, the new scheme needs further investigation...


>  			break;
>  		}
>  	}
> @@ -2459,10 +2526,20 @@ arm64_calc_virtual_memory_ranges(void)
>  		break;
>          }
>  
> -	vmemmap_size = ALIGN((1UL << (ms->VA_BITS - machdep->pageshift)) *
> SIZE(page), PUD_SIZE);
> +	if (kt->flags2 & ARM64_NEW_VMEMMAP) {
> +#define STRUCT_PAGE_MAX_SHIFT   6
> +		vmemmap_size = 1UL << (ms->VA_BITS - machdep->pageshift - 1
> +						+ STRUCT_PAGE_MAX_SHIFT);
> +
> +		vmemmap_start = ms->page_offset - vmemmap_size;
> +		vmemmap_end = ms->page_offset;
> +	} else {
> +		vmemmap_size = ALIGN((1UL << (ms->VA_BITS - machdep->pageshift)) *
> SIZE(page), PUD_SIZE);
> +
> +		vmemmap_start = vmalloc_end + SZ_64K;
> +		vmemmap_end = vmemmap_start + vmemmap_size;
> +	}
>  	vmalloc_end = (ms->page_offset - PUD_SIZE - vmemmap_size - SZ_64K);
> -	vmemmap_start = vmalloc_end + SZ_64K;
> -	vmemmap_end = vmemmap_start + vmemmap_size;
>  
>  	ms->vmalloc_end = vmalloc_end - 1;
>  	ms->vmemmap_vaddr = vmemmap_start;
> diff --git a/defs.h b/defs.h
> index a09fa9a..40e02fc 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -2844,8 +2844,8 @@ typedef u64 pte_t;
>  
>  #define PTOV(X) \
>  	((unsigned
>  	long)(X)-(machdep->machspec->phys_offset)+(machdep->machspec->page_offset))
> -#define VTOP(X) \
> -	((unsigned
> long)(X)-(machdep->machspec->page_offset)+(machdep->machspec->phys_offset))
> +
> +#define VTOP(X)               arm64_VTOP((ulong)(X))
>  
>  #define USERSPACE_TOP   (machdep->machspec->userspace_top)
>  #define PAGE_OFFSET     (machdep->machspec->page_offset)
> @@ -2944,12 +2944,16 @@ typedef signed int s32;
>   *          arch/arm64/include/asm/memory.h
>   *          arch/arm64/include/asm/pgtable.h
>   */
> -
> -#define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) <<
> (machdep->machspec->VA_BITS - 1))
> +#define ARM64_VA_START       ((0xffffffffffffffffUL) \
> +					<< machdep->machspec->VA_BITS)
> +#define ARM64_PAGE_OFFSET    ((0xffffffffffffffffUL) \
> +					<< (machdep->machspec->VA_BITS - 1))
>  #define ARM64_USERSPACE_TOP  ((1UL) << machdep->machspec->VA_BITS)
> -#define ARM64_MODULES_VADDR  (ARM64_PAGE_OFFSET - MEGABYTES(64))
> -#define ARM64_MODULES_END    (ARM64_PAGE_OFFSET - 1)
> -#define ARM64_VMALLOC_START  ((0xffffffffffffffffUL) <<
> machdep->machspec->VA_BITS)
> +
> +/* only used for v4.6 or later */
> +#define ARM64_MODULES_VSIZE     MEGABYTES(128)
> +#define ARM64_KASAN_SHADOW_SIZE (1UL << (machdep->machspec->VA_BITS - 3))
> +
>  /*
>   * The following 3 definitions are the original values, but are obsolete
>   * for 3.17 and later kernels because they are now build-time calculations.
> @@ -3028,6 +3032,10 @@ struct machine_specific {
>  	ulong kernel_flags;
>  	ulong irq_stack_size;
>  	ulong *irq_stacks;
> +	/* only needed for kaslr-enabled kernel */
> +	ulong kimage_voffset;
> +	ulong kimage_text;
> +	ulong kimage_end;
>  };

Don't forget -- please display these in arm64_dump_machdep_table()!   ;-)

>  
>  struct arm64_stackframe {
> diff --git a/main.c b/main.c
> index 821bb4e..3f24908 100644
> --- a/main.c
> +++ b/main.c
> @@ -227,9 +227,10 @@ main(int argc, char **argv)
>  						optarg);
>  				}
>  			} else if (STREQ(long_options[option_index].name, "kaslr")) {
> -				if (!machine_type("X86_64"))
> -					error(INFO, "--kaslr only valid "
> -						"with X86_64 machine type.\n");
> +				if (!machine_type("X86_64") &&
> +				    !machine_type("ARM64"))
> +					error(INFO, "--kaslr not valid "
> +						"with this machine type.\n");
>  				else if (STREQ(optarg, "auto"))
>  					kt->flags2 |= (RELOC_AUTO|KASLR);
>  				else {
> diff --git a/symbols.c b/symbols.c
> index a8d3563..df27793 100644
> --- a/symbols.c
> +++ b/symbols.c
> @@ -593,7 +593,8 @@ kaslr_init(void)
>  {
>  	char *string;
>  
> -	if (!machine_type("X86_64") || (kt->flags & RELOC_SET))
> +	if ((!machine_type("X86_64") && !machine_type("ARM64")) ||
> +	    (kt->flags & RELOC_SET))
>  		return;
>  
>  	/*
> @@ -712,7 +713,7 @@ store_symbols(bfd *abfd, int dynamic, void *minisyms,
> long symcount,
>  	if (machine_type("X86")) {
>  		if (!(kt->flags & RELOC_SET))
>  			kt->flags |= RELOC_FORCE;
> -	} else if (machine_type("X86_64")) {
> +	} else if (machine_type("X86_64") || machine_type("ARM64")) {
>  		if ((kt->flags2 & RELOC_AUTO) && !(kt->flags & RELOC_SET))
>  			derive_kaslr_offset(abfd, dynamic, from,
>  				fromend, size, store);
> @@ -783,7 +784,8 @@ store_sysmap_symbols(void)
>                  error(FATAL, "symbol table namespace malloc: %s\n",
>                          strerror(errno));
>  
> -	if (!machine_type("X86") && !machine_type("X86_64"))
> +	if (!machine_type("X86") && !machine_type("X86_64") &&
> +	    !machine_type("ARM64"))
>  		kt->flags &= ~RELOC_SET;
>  
>  	first = 0;
> @@ -4681,7 +4683,7 @@ value_search(ulong value, ulong *offset)
>  	if ((sp = machdep->value_to_symbol(value, offset)))
>  		return sp;
>  
> -	if (IS_VMALLOC_ADDR(value))
> +	if (IS_VMALLOC_ADDR(value))
>  		goto check_modules;
>  
>  	if ((sp = symval_hash_search(value)) == NULL)
> --
> 2.8.1
 

That's all I've got for this pass.

Thanks again,
  Dave




More information about the Crash-utility mailing list