[Crash-utility] [PATH v4 1/2] arm64: fix kernel memory map handling for kaslr-enabled kernel

Wed Jun 1 15:41:04 UTC 2016

----- Original Message -----

> > >  
> > >  	case PRE_GDB:
> > > +		/* This check is somewhat redundant */
> > > +		if (kernel_symbol_exists("kimage_voffset"))
> > > +			machdep->flags |= NEW_VMEMMAP;
> > > +
> > 
> > Certainly not redundant on a live system without CONFIG_RANDOMIZE_BASE.
> 
> Yeah, but for crashdump case, it's redundant.
> 

I understand, but if you're going to add a comment, it should clarify
rather than mystify...   ;-)

> > >  		arm64_calc_VA_BITS();
> > > -		machdep->machspec->page_offset = ARM64_PAGE_OFFSET;
> > > +		ms = machdep->machspec;
> > > +		ms->page_offset = ARM64_PAGE_OFFSET;
> > > +		/* FIXME: idmap for NEW_VMEMMAP */
> > 
> > What's the FIXME here?
> 
> Identity-mapping no longer starts from PAGE_OFFSET if KASLR.
> Since machdep->identity_map_base is not used anywhere in crash,
> I wonder how I should handle it.
> 
> > >  		machdep->identity_map_base = ARM64_PAGE_OFFSET;

You're right, machdep->identity_map_base is (currently) only used in 
architecture-specific files, although arm64.c does not use it.  
But what you have done above is correct.

> > > @@ -631,6 +718,19 @@ arm64_calc_phys_offset(void)
> > >  	if (machdep->flags & PHYS_OFFSET) /* --machdep override */
> > >  		return;
> > >  
> > > +	if (machdep->flags & NEW_VMEMMAP) {
> > > +		struct syment *sp;
> > > +		ulong value;
> > > +
> > > +		sp = kernel_symbol_search("memstart_addr");
> > > +		if (sp && readmem(sp->value, KVADDR, (char *)&value,
> > > +				sizeof(value), "memstart_addr",
> > > +				QUIET|RETURN_ON_ERROR)) {
> > > +			ms->phys_offset = value;
> > > +			return;
> > > +		}
> > > +	}
> > > +
> > 
> > As we've discussed before, I cannot accept the chicken-and-egg snippet
> > above.
> > 
> > If machdep->machspec->kimage_voffset has not been determined, then the arm64_VTOP()
> > call made by readmem() will utilize machdep->machspec->phys_offset -- which is what
> > you're trying to initialize here.
> 
> Right, I was a bit confused because my patch basically had an assumption
> that we have some sort of way to know kimage_voffset.
> 
> Meanwhile, if this hunk be moved after "if ACTIVE())" like,
>     ms->phys_offset = 0;
>     if (ACTIVE()) {
>         read("/proc/iomem");
>         ...
>     } else {
>         if (machdep->flags & NEW_VMEMMAP) {
>             readmem("memstart_addr")
>             ...
>         } else if (DISKDUMP_DUMPFILE() ...) {
>         } else if (KDUMP_DUMPFILE() ...) {
>         }
>     }
> the crash command would still work for a live system if !KASLR.

Again, the readmem() is not necessary.

> 
> > On my live system that has a phys_offset of 0x40000000000, the readmem() 
> > will fail quietly, but that's not a acceptable usage of the RETURN_ON_ERROR
> > failure mode, because "memstart_addr" is a legitimate virtual  address that
> > should never fail.  Also, because the actual kernel phys_offset can be
> > negative, it would seem to be entirely within the realm of possibility that
> > the readmem() could mistakenly return successfully, but have read the wrong
> > location.
> > 
> > So what it boils down to is that readmem() should NEVER be called until ALL of the
> > pieces required by arm64_VTOP() have all been properly initialized, however
> > that might be accomplished.  Not to mention that calling it this early sets  a
> > dangerous precedent.
> 
> If you think so, we can add a specialized function like:
>     arm64_readmem_kernel_symbol(char *, ...)

I don't think so -- the point is to avoid making a readmem() call this early.

> > And in the case of kdump's ELF vmcore and compressed vmcore formats, there
> > is an existing API between kdump and the crash utility to pass back the
> > phys_base.  In the kexec-tool's makedumpfile.c file, there is the get_phys_base_arm64()
> > function that
> 
> I can't find makedumpfile.c in kexec-tools.

Hold on, clarification below...

> 
> > currently calculates the offset by using the PT_LOAD segments, and presumably will
> > have to be modified to use new VMCOREINFO data.  But regardless of how it's one,
> > the architecture-neutral write_kdump_header() laster copies that offset value into
> > the kdump_sub_header.phys_base field for the crash utility to access.  Trying to do
> > a readmem() this early in time is essentially breaking that API.
> 
> You may already notice that "phys_offset" may have two different meanings:
>   * the start address of System RAM
>     (PHYS_OFFSET on v4.5 or early, and __PHYS_OFFSET on v4.6 or later)
>   * the value used in phys_to_virt()/virt_to_phys() for any address
>     in linear-mapping
>     (PHYS_OFFSET on v4.5 or early, and PHYS_OFFSET on v4.6 or later)
> 
> I think that current crash utility needs only the latter.
> Please correct me if I misunderstand something.

Right, the __PHYS_OFFSET value seems to be the kernel entry point address, 
but that is not used.  Crash needs the PHYS_OFFSET value to mimic the kernel's
arm64 __virt_to_phys() function.  

Sorry about my reference to "kexec-tool's makedumpfile.c file", as I was
talking about the Red Hat kexec-tools package, which also contains
the makedumpfile facility.

Makedumpfile is upstream in a couple places, where the most recently released 
version (1.5.9) is here:

  https://sourceforge.net/projects/makedumpfile/files/makedumpfile/

Or the development repository is here:

  git://git.code.sf.net/p/makedumpfile/code

Regardless of which location, there are the generic get_phys_base() macros 
here in "makedumpfile.h" for those architectures that need it:

  makedumpfile.h 814 #define get_phys_base() get_phys_base_arm64()
  makedumpfile.h 826 #define get_phys_base() get_phys_base_arm()
  makedumpfile.h 837 #define get_phys_base() stub_true()
  makedumpfile.h 850 #define get_phys_base() get_phys_base_x86_64()
  makedumpfile.h 861 #define get_phys_base() stub_true()
  makedumpfile.h 871 #define get_phys_base() stub_true()
  makedumpfile.h 882 #define get_phys_base() stub_true()
  makedumpfile.h 894 #define get_phys_base() get_phys_base_ia64()

And where the arm64-specific function is in "arch/arm64.c":

  int
  get_phys_base_arm64(void)
  {
          unsigned long phys_base = ULONG_MAX;
          unsigned long long phys_start;
          int i;
          /*
           * We resolve phys_base from PT_LOAD segments. LMA contains physical
           * address of the segment, and we use the lowest start as
           * phys_base.
           */
          for (i = 0; get_pt_load(i, &phys_start, NULL, NULL, NULL); i++) {
                  if (phys_start < phys_base)
                          phys_base = phys_start;
          }

          if (phys_base == ULONG_MAX) {
                  ERRMSG("Can't determine phys_base\n");
                  return FALSE;
          }

          info->phys_base = phys_base;

          DEBUG_MSG("phys_base    : %lx\n", phys_base);

          return TRUE;
  }

Note that it currently looks for the lowest physical address that is 
referenced in any of the PT_LOAD segments of /proc/vmcore.  As you have
discussed for Linux 4.6, perhaps it will be necessary that the value 
of PHYS_OFFSET ("memstart_addr") will have to be passed in a VMCOREINFO
item.  It's not clear to me why PT_LOAD p_paddr values will no longer
be usable, but that's up to the makedumpfile/kdump maintainers.  And
certainly a convenient VMCOREINFO item would be ideal/preferable.

Anyway, later on, the info->phys_base value gets stored in the compressed
kdump header in the write_kdump_header() function in "makedumpfile.c":

  int
  write_kdump_header(void)
  {
          int ret = FALSE;
          size_t size;
          off_t offset_note, offset_vmcoreinfo;
          unsigned long size_note, size_vmcoreinfo;
          struct disk_dump_header *dh = info->dump_header;
          struct kdump_sub_header kh;
          char *buf = NULL;

  ... [ cut ] ... 

          /*
           * Write sub header
           */
          size = sizeof(struct kdump_sub_header);
          memset(&kh, 0, size);
          /* 64bit max_mapnr_64 */
          kh.max_mapnr_64 = info->max_mapnr;
          kh.phys_base  = info->phys_base;
          kh.dump_level = info->dump_level;
  ...

And then in the crash utility source code, it simply gets accessed
like so:

  int
  diskdump_phys_base(unsigned long *phys_base)
  {
          if (KDUMP_CMPRS_VALID()) {
                  *phys_base = dd->sub_header_kdump->phys_base;
                  return TRUE;
          }

          return FALSE;
  }

Now, for ELF vmcores, the crash utility calls arm64_kdump_phys_base() 
which calls arm_kdump_phys_base(), which simply mimics what the 
makedumpfile.c's get_phys_base_arm64() does.  And that function may
need to be changed for Linux 4.6 in order to access the new VMCOREINFO
variable.

Clear as mud?

Dave