[Crash-utility] help debug number of CPU detect failure

Dave Anderson anderson at redhat.com
Thu Mar 5 20:53:33 UTC 2020


> > I suspect that it's a problem with either the --kaslr offset and/or
> > the phys_base value that you have used.
> 
> Is there method to know or print kaslr & phy_base in a running Linux system?

They are normally passed in the VMCOREINFO data that is contained in an ELF PT_NOTE
in the dumpfile header.  For example, here's a dump of the normal VMCOREINFO data,
where the phys_base and KASLR offsets are down near the bottom:

                      OSRELEASE=4.18.0-185.el8.x86_64
                      PAGESIZE=4096
                      SYMBOL(init_uts_ns)=ffffffffbd812540
                      SYMBOL(node_online_map)=ffffffffbda0f520
                      SYMBOL(swapper_pg_dir)=ffffffffbd80a000
                      SYMBOL(_stext)=ffffffffbc600000
                      SYMBOL(vmap_area_list)=ffffffffbd8d78b0
                      SYMBOL(mem_section)=ffff956a3ffd2000
                      LENGTH(mem_section)=2048
                      SIZE(mem_section)=16
                      OFFSET(mem_section.section_mem_map)=0
                      SIZE(page)=64
                      SIZE(pglist_data)=171968
                      SIZE(zone)=1472
                      SIZE(free_area)=88
                      SIZE(list_head)=16
                      SIZE(nodemask_t)=128
                      OFFSET(page.flags)=0
                      OFFSET(page._refcount)=52
                      OFFSET(page.mapping)=24
                      OFFSET(page.lru)=8
                      OFFSET(page._mapcount)=48
                      OFFSET(page.private)=40
                      OFFSET(page.compound_dtor)=16
                      OFFSET(page.compound_order)=17
                      OFFSET(page.compound_head)=8
                      OFFSET(pglist_data.node_zones)=0
                      OFFSET(pglist_data.nr_zones)=171232
                      OFFSET(pglist_data.node_start_pfn)=171240
                      OFFSET(pglist_data.node_spanned_pages)=171256
                      OFFSET(pglist_data.node_id)=171264
                      OFFSET(zone.free_area)=192
                      OFFSET(zone.vm_stat)=1296
                      OFFSET(zone.spanned_pages)=112
                      OFFSET(free_area.free_list)=0
                      OFFSET(list_head.next)=0
                      OFFSET(list_head.prev)=8
                      OFFSET(vmap_area.va_start)=0
                      OFFSET(vmap_area.list)=48
                      LENGTH(zone.free_area)=11
                      SYMBOL(log_buf)=ffffffffbd85b140
                      SYMBOL(log_buf_len)=ffffffffbd85b13c
                      SYMBOL(log_first_idx)=ffffffffbe319778
                      SYMBOL(clear_idx)=ffffffffbe319744
                      SYMBOL(log_next_idx)=ffffffffbe319768
                      SIZE(printk_log)=16
                      OFFSET(printk_log.ts_nsec)=0
                      OFFSET(printk_log.len)=8
                      OFFSET(printk_log.text_len)=10
                      OFFSET(printk_log.dict_len)=12
                      LENGTH(free_area.free_list)=5
                      NUMBER(NR_FREE_PAGES)=0
                      NUMBER(PG_lru)=5
                      NUMBER(PG_private)=12
                      NUMBER(PG_swapcache)=9
                      NUMBER(PG_swapbacked)=18
                      NUMBER(PG_slab)=8
                      NUMBER(PG_hwpoison)=22
                      NUMBER(PG_head_mask)=32768
                      NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
                      NUMBER(HUGETLB_PAGE_DTOR)=2
                      NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
   ===============>   NUMBER(phys_base)=16437477376
                      SYMBOL(init_top_pgt)=ffffffffbd80a000
                      NUMBER(pgtable_l5_enabled)=0
                      SYMBOL(node_data)=ffffffffbda0ad20
                      LENGTH(node_data)=1024
   ===============>   KERNELOFFSET=3b600000
                      NUMBER(KERNEL_IMAGE_SIZE)=1073741824
                      NUMBER(sme_mask)=0
                      CRASHTIME=1583350919

But in your Azure-generated dumpfile, I note that each cpu's NT_PRSTATUS note
contains junk data, and while does have a VMCOREINFO note, it contains this:

Elf64_Nhdr:
               n_namesz: 11 ("VMCOREINFO")
               n_descsz: 42
                 n_type: 0 (unused)
                         FAKE1=IGNORE1
                         FAKE2=IGNORE2
                         FAKE3=IGNORE3

So that's why you need to pass in the two arguments.

Now, the crash utility should be able to be brought up successfully
on a live system without passing the arguments.  And once you've done
that, you could get the values like this:  

  crash> help -m | grep phys_base
                  phys_base: 3d3c00000
  crash> help -k | grep relocate
        relocate: ffffffffc4a00000  (KASLR offset: 3b600000 / 950MB)
  crash> 

But since they change with each reboot, you would have to capture them
while running on the live system, and save them somewhere for a subsequent
crash.  So that goes back to my question -- how did you get the numbers
that you used?

Dave



 





More information about the Crash-utility mailing list