[Crash-utility] [PATCH 0/5] Second phase of future support for x86_64 5-level page tables
Dou Liyang
douly.fnst at cn.fujitsu.com
Fri Jan 12 01:49:42 UTC 2018
Hi Dave,
[...]
> Thank you very much for the work you have done so far. I have not spent
> any time looking at the patches in detail, but instead I first ran a quick
> test of the patch on a set of ~250 kernels that I keep around for testing,
> where I just ran the "mod" command to at least verify that kernel virtual
> addresses could be translated.
>
> Now, as always, backwards compatibility must be maintained. I do not have
> any sadump dumpfiles, but obviously you (Fujitsu) can test those. However
Yes, I am waiting the machine which can support sadump. I will test the
sadump dumpfiles.
> I do have some older Xen and RHEL4-era kernels in my sample set.
>
Thank you so much about that, I will keep the backwards compatibility.
> As it turns out, *all* RHEL4 kernels failed (i.e. any kernel version
> earlier than 2.6.9), which report "WARNING: cannot access vmalloc'd
> module memory" during initialization when trying to gather the kernel
> module list.
>
> For all of the 2.6.9 and earlier kernels, they show the "WARNING: cannot
> access vmalloc'd module memory" message during session initialization:
>
> $ crash vmlinux-2.6.9-42.0.2.ELsmp.gz vmcore
>
> crash 7.2.1rc26
> Copyright (C) 2002-2017 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for details.
>
> GNU gdb (GDB) 7.6
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
>
> please wait... (gathering module symbol data)
> WARNING: cannot access vmalloc'd module memory
>
> KERNEL: vmlinux-2.6.9-42.0.2.ELsmp.gz
> DUMPFILE: vmcore
> CPUS: 8
> DATE: Tue Nov 21 19:14:17 2006
> UPTIME: 6 days, 01:23:25
> LOAD AVERAGE: 24.34, 7.89, 4.46
> TASKS: 865
> NODENAME: lonrs00268
> RELEASE: 2.6.9-42.0.2.ELsmp
> VERSION: #1 SMP Thu Aug 17 17:57:31 EDT 2006
> MACHINE: x86_64 (2199 Mhz)
> MEMORY: 16 GB
> PANIC: "Kernel BUG at panic:75"
> PID: 20046
> COMMAND: "oracle"
> TASK: 101c6b047f0 [THREAD_INFO: 101a428a000]
> CPU: 7
> STATE: TASK_RUNNING (NMI)
>
> crash>
>
> If I run the session with "crash -d4 vmlinux-2.6.9-42.0.2.ELsmp.gz vmcore",
> you can see that it it reads a "pud page", but then fails:
>
> ...
> please wait... (gathering module symbol data)module: ffffffffa0634180
> <readmem: ffffffffa0634180, KVADDR, "module struct", 1408, (ROE|Q), f73780>
> <readmem: 4f8000, PHYSADDR, "pud page", 4096, (FOE), 2080b40>
> <read_diskdump: addr: 4f8000 paddr: 4f8000 cnt: 4096>
>
> crash: invalid kernel virtual address: ffffffffa0634180 type: "module struct"
>
> WARNING: cannot access vmalloc'd module memory
> ...
>
> Without the patch, the module virtual address translation succeeds:
>
> ...
> please wait... (gathering module symbol data)module: ffffffffa0634180
> <readmem: ffffffffa0634180, KVADDR, "module struct", 1408, (ROE|Q), f705e0>
> <readmem: 103000, PHYSADDR, "pgd page", 4096, (FOE), 25d7b50>
> <read_diskdump: addr: 103000 paddr: 103000 cnt: 4096>
> <readmem: 105000, PHYSADDR, "pmd page", 4096, (FOE), 25d8b60>
> <read_diskdump: addr: 105000 paddr: 105000 cnt: 4096>
> <readmem: d9bfb0000, PHYSADDR, "page table", 4096, (FOE), 25d9b70>
> <read_diskdump: addr: d9bfb0000 paddr: d9bfb0000 cnt: 4096>
> <read_diskdump: addr: ffffffffa0634180 paddr: d9bfb3180 cnt: 1408>
> ...
>
> So it appears to be reading from the wrong starting page table location,
> i.e., from "pud page 4f8000" instead of "pgd page 103000".
>
> Also, several Xen kernels failed with segmentation violations during
> session initialization. They all fail here in x86_64_xendump_load_page(),
> when "*pgd" gets referenced:
>
> static char *
> x86_64_xendump_load_page(ulong kvaddr, struct xendump_data *xd)
> {
> ulong mfn;
> ulong *pgd, *pud, *pmd, *ptep;
>
> pgd = ((ulong *)machdep->pgd) + pgd_index(kvaddr);
> mfn = ((*pgd) & PHYSICAL_PAGE_MASK) >> PAGESHIFT();
> ^^^^
>
> Here is the relevant part of the gdb trace of a 2.6.18-based xen
> kernel:
>
> Program terminated with signal 11, Segmentation fault.
> #0 0x0000000000502748 in x86_64_xendump_load_page (kvaddr=kvaddr at entry=18446744071568498888, xd=0xf521a0 <xendump_data>,
> xd=0xf521a0 <xendump_data>) at x86_64.c:7003
> 7003 mfn = ((*pgd) & PHYSICAL_PAGE_MASK) >> PAGESHIFT();
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 lzo-2.06-8.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 snappy-1.1.0-3.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x0000000000502748 in x86_64_xendump_load_page (kvaddr=kvaddr at entry=18446744071568498888, xd=0xf521a0 <xendump_data>,
> xd=0xf521a0 <xendump_data>) at x86_64.c:7003
> #1 0x0000000000503191 in x86_64_xendump_p2m_create (xd=0xf521a0 <xendump_data>) at x86_64.c:6749
> #2 0x0000000000565d4e in xc_core_create_pfn_tables () at xendump.c:1258
> #3 xc_core_read (addr=<optimized out>, paddr=7080864, cnt=32, bufptr=0xf70f80 <shared_bufs>) at xendump.c:168
> #4 read_xendump (fd=<optimized out>, bufptr=0xf70f80 <shared_bufs>, cnt=32, addr=<optimized out>, paddr=7080864) at xendump.c:836
> #5 0x000000000047b038 in readmem (addr=18446744071569148832, memtype=memtype at entry=1, buffer=buffer at entry=0xf70f80 <shared_bufs>,
> size=size at entry=32, type=type at entry=0x94dcc3 "possible", error_handle=error_handle at entry=2) at memory.c:2233
> #6 0x00000000004ea33e in cpu_maps_init () at kernel.c:903
> #7 kernel_init () at kernel.c:118
> #8 0x0000000000467e5a in main_loop () at main.c:768
> #9 0x000000000069dad3 in captured_command_loop (data=data at entry=0x0) at main.c:258
> #10 0x000000000069c37a in catch_errors (func=func at entry=0x69dac0 <captured_command_loop>, func_args=func_args at entry=0x0,
> errstring=errstring at entry=0x8e713f "", mask=mask at entry=6) at exceptions.c:557
> #11 0x000000000069ea66 in captured_main (data=data at entry=0x7ffd637c92a0) at main.c:1064
> #12 0x000000000069c37a in catch_errors (func=func at entry=0x69dda0 <captured_main>, func_args=func_args at entry=0x7ffd637c92a0,
> errstring=errstring at entry=0x8e713f "", mask=mask at entry=6) at exceptions.c:557
> #13 0x000000000069edc7 in gdb_main (args=0x7ffd637c92a0) at main.c:1079
> #14 gdb_main_entry (argc=<optimized out>, argv=argv at entry=0x7ffd637c9408) at main.c:1099
> #15 0x00000000004f0604 in gdb_main_loop (argc=<optimized out>, argc at entry=3, argv=argv at entry=0x7ffd637c9408) at gdb_interface.c:76
> #16 0x00000000004662c5 in main (argc=3, argv=0x7ffd637c9408) at main.c:707
> (gdb) p pgd
> $1 = (ulong *) 0xfffffffc054f4210
> (gdb)
>
> I haven't investigated further, but in all of the xen cases, the
> value of "pgd" above was a kernel virtual address as shown in the
> example above.
>
> However, without the patch, the function looks like this, and with
> my debug printf of "pml4", the address is a user-space address as
> expected:
>
> static char *
> x86_64_xendump_load_page(ulong kvaddr, struct xendump_data *xd)
> {
> ulong mfn;
> ulong *pml4, *pgd, *pmd, *ptep;
>
> pml4 = ((ulong *)machdep->machspec->pml4) + pml4_index(kvaddr);
> mfn = ((*pml4) & PHYSICAL_PAGE_MASK) >> PAGESHIFT();
>
> fprintf(fp, "x86_64_xendump_load_page: pml4: %lx\n", pml4);
>
> ...
>
> So for example, with the debug statement, I see this:
>
> # crash vmlinux-2.6.18-1.2714.el5xen.gz xguest-crashdump
>
> crash 7.2.1rc26
> Copyright (C) 2002-2017 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for details.
>
> GNU gdb (GDB) 7.6
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
>
> x86_64_xendump_load_page: pml4: 25d6c08
> x86_64_xendump_load_page: pml4: 25d6c08
> KERNEL: vmlinux-2.6.18-1.2714.el5xen.gz
> DUMPFILE: xguest-crashdump
> ...
>
>
> In a private email, I will send you a pointer to where I have temporarily
> stored the 2 vmlinux/vmcore pairs shown above. I'm thinking that it will
> probably be fairly easy for you to figure out what's happening in both cases.
>
Yes, I saw it! Thanks you very much :-)
Thanks,
dou.
> Again, I very much appreciate the work you have undertaken here.
>
> Thanks,
> Dave
>
>
>
>
>
More information about the Crash-utility
mailing list