[Crash-utility] help debug number of CPU detect failure

Fri Mar 6 14:13:03 UTC 2020

----- Original Message -----
> On Thu, Mar 5, 2020 at 1:07 PM Santosh <ysan99 at gmail.com> wrote:
> >
> > On Thu, Mar 5, 2020 at 12:54 PM Dave Anderson <anderson at redhat.com> wrote:
> > >
> > > > > I suspect that it's a problem with either the --kaslr offset and/or
> > > > > the phys_base value that you have used.
> > > >
> > > > Is there method to know or print kaslr & phy_base in a running Linux
> > > > system?
> > >
> > > They are normally passed in the VMCOREINFO data that is contained in an
> > > ELF PT_NOTE
> > > in the dumpfile header.  For example, here's a dump of the normal
> > > VMCOREINFO data,
> > > where the phys_base and KASLR offsets are down near the bottom:
> > >
> > >                       OSRELEASE=4.18.0-185.el8.x86_64
> > >                       PAGESIZE=4096
> > >                       SYMBOL(init_uts_ns)=ffffffffbd812540
> > >                       SYMBOL(node_online_map)=ffffffffbda0f520
> > >                       SYMBOL(swapper_pg_dir)=ffffffffbd80a000
> > >                       SYMBOL(_stext)=ffffffffbc600000
> > >                       SYMBOL(vmap_area_list)=ffffffffbd8d78b0
> > >                       SYMBOL(mem_section)=ffff956a3ffd2000
> > >                       LENGTH(mem_section)=2048
> > >                       SIZE(mem_section)=16
> > >                       OFFSET(mem_section.section_mem_map)=0
> > >                       SIZE(page)=64
> > >                       SIZE(pglist_data)=171968
> > >                       SIZE(zone)=1472
> > >                       SIZE(free_area)=88
> > >                       SIZE(list_head)=16
> > >                       SIZE(nodemask_t)=128
> > >                       OFFSET(page.flags)=0
> > >                       OFFSET(page._refcount)=52
> > >                       OFFSET(page.mapping)=24
> > >                       OFFSET(page.lru)=8
> > >                       OFFSET(page._mapcount)=48
> > >                       OFFSET(page.private)=40
> > >                       OFFSET(page.compound_dtor)=16
> > >                       OFFSET(page.compound_order)=17
> > >                       OFFSET(page.compound_head)=8
> > >                       OFFSET(pglist_data.node_zones)=0
> > >                       OFFSET(pglist_data.nr_zones)=171232
> > >                       OFFSET(pglist_data.node_start_pfn)=171240
> > >                       OFFSET(pglist_data.node_spanned_pages)=171256
> > >                       OFFSET(pglist_data.node_id)=171264
> > >                       OFFSET(zone.free_area)=192
> > >                       OFFSET(zone.vm_stat)=1296
> > >                       OFFSET(zone.spanned_pages)=112
> > >                       OFFSET(free_area.free_list)=0
> > >                       OFFSET(list_head.next)=0
> > >                       OFFSET(list_head.prev)=8
> > >                       OFFSET(vmap_area.va_start)=0
> > >                       OFFSET(vmap_area.list)=48
> > >                       LENGTH(zone.free_area)=11
> > >                       SYMBOL(log_buf)=ffffffffbd85b140
> > >                       SYMBOL(log_buf_len)=ffffffffbd85b13c
> > >                       SYMBOL(log_first_idx)=ffffffffbe319778
> > >                       SYMBOL(clear_idx)=ffffffffbe319744
> > >                       SYMBOL(log_next_idx)=ffffffffbe319768
> > >                       SIZE(printk_log)=16
> > >                       OFFSET(printk_log.ts_nsec)=0
> > >                       OFFSET(printk_log.len)=8
> > >                       OFFSET(printk_log.text_len)=10
> > >                       OFFSET(printk_log.dict_len)=12
> > >                       LENGTH(free_area.free_list)=5
> > >                       NUMBER(NR_FREE_PAGES)=0
> > >                       NUMBER(PG_lru)=5
> > >                       NUMBER(PG_private)=12
> > >                       NUMBER(PG_swapcache)=9
> > >                       NUMBER(PG_swapbacked)=18
> > >                       NUMBER(PG_slab)=8
> > >                       NUMBER(PG_hwpoison)=22
> > >                       NUMBER(PG_head_mask)=32768
> > >                       NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
> > >                       NUMBER(HUGETLB_PAGE_DTOR)=2
> > >                       NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
> > >    ===============>   NUMBER(phys_base)=16437477376
> > >                       SYMBOL(init_top_pgt)=ffffffffbd80a000
> > >                       NUMBER(pgtable_l5_enabled)=0
> > >                       SYMBOL(node_data)=ffffffffbda0ad20
> > >                       LENGTH(node_data)=1024
> > >    ===============>   KERNELOFFSET=3b600000
> > >                       NUMBER(KERNEL_IMAGE_SIZE)=1073741824
> > >                       NUMBER(sme_mask)=0
> > >                       CRASHTIME=1583350919
> > >
> > > But in your Azure-generated dumpfile, I note that each cpu's NT_PRSTATUS
> > > note
> > > contains junk data, and while does have a VMCOREINFO note, it contains
> > > this:
> > >
> > > Elf64_Nhdr:
> > >                n_namesz: 11 ("VMCOREINFO")
> > >                n_descsz: 42
> > >                  n_type: 0 (unused)
> > >                          FAKE1=IGNORE1
> > >                          FAKE2=IGNORE2
> > >                          FAKE3=IGNORE3
> > >
> > > So that's why you need to pass in the two arguments.
> > >
> > > Now, the crash utility should be able to be brought up successfully
> > > on a live system without passing the arguments.  And once you've done
> > > that, you could get the values like this:
> > >
> > >   crash> help -m | grep phys_base
> > >                   phys_base: 3d3c00000
> > >   crash> help -k | grep relocate
> > >         relocate: ffffffffc4a00000  (KASLR offset: 3b600000 / 950MB)
> > >   crash>
> > >
> > > But since they change with each reboot, you would have to capture them
> > > while running on the live system, and save them somewhere for a
> > > subsequent
> > > crash.  So that goes back to my question -- how did you get the numbers
> > > that you used?
> >
> > The number I had got by simply grepping through coredump strings.
> > $ strings vm1_numa_4gb_5cpu.coredump | grep -v strings | grep
> > 'KERNELOFFSET=\|NUMBER(phys_base)='
> >
> > Machine is still running and I cross verified those numbers with crash
> > and those were correct.
> >
> > crash> p vmcoreinfo_data+1600
> > $1 = (unsigned char *) 0xffff917d3cde1640
> > "poison)=22\nNUMBER(PG_head_mask)=32768\nNUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-128\nNUMBER(HUGETLB_PAGE_DTOR)=2\nNUMBER(phys_base)=4355784704\nSYMBOL(init_top_pgt)=ffffffff82a0a000\nSYMBOL(node_data)=ffffffff82c5d780\nLENGTH(node_data)=1024\nKERNELOFFSET=600000\nNUMBER"...
> >
> > Now it appears to me that something wrong in Azure generated dump file.
> 
> Something to do with numa:
> 
> santosh at u1804lts:~$ cat /proc/sys/kernel/numa_balancing
> 1
> 
> HyperV VM with 1 numa node (numa_balancing = 0) -- Linux with nokaslr
> -- vm2core -- ELF coredump -- crash tool -- Ok
> HyperV VM with 1 numa node (numa_balancing = 0) -- LInux with kaslr --
> vm2core -- ELF coredump -- crash tool -- Ok
> HyperV VM with 2 numa nodes (numa_balancing = 1) -- Linux with nokaslr
> -- vm2core -- ELF coredump -- crash tool -- Ok
> HyperV VM with 2 numa nodes (numa_balancing = 1) -- LInux with kaslr
> -- vm2core -- ELF coredump -- crash tool -- Not ok
> 
> Do we have to specify the numa topology somehow to crash tool or it
> should  already be handled in coredump file?

Definitely not.  The crash utility is only interested in:

  1. kernel virtual address values -- which KASLR modifies from the values 
     compiled into the vmlinux file, 
  2. translating those kernel virtual addresses into physical addresses, and
  3. accessing those physical addresses from the memory source. 

As I understand it, numa_balancing is concerned with user-space virtual
address mapping, where the kernel may re-map an underlying physical
address from one NUMA node to another.  User-space memory is never
accessed by the crash utility unless requested by a run-time command
that specifically specifies it.

Dave