[Crash-utility] gcore: Segmentation fault due to renaming of old_rsp symbol in kernel

Dave Anderson anderson at redhat.com
Mon Oct 31 20:47:09 UTC 2016


Hi Eric,

It's always appreciated when bug reports come with proposed fixes, and
and your patch certainly looks reasonable to me.  But the gcore extension
module is maintained by Daisuke Hatayama, and any changes will require 
his ACK and a subsequent package update.  Daisuke is a member of this 
mailing list, but just to make sure he sees this, I've cc'd him directly
as well.

Thanks,
  Dave



----- Original Message -----
> I am trying to use gcore to generate a user application core from a kernel
> dump file. I compiled the latest crash-7.1.6 and crash-gcore-command-1.3.1
> from https://people.redhat.com/anderson/. I installed a debug kernel
> (vmlinux-4.1.34-33-debug.gz from openSUSE Leap 42.1) and did a controlled
> (sysrq-trigger) crash. When I attempt to use gcore on the process in
> question, after reading
> <https://people.redhat.com/anderson/extensions/gcore_help_gcore.html>, I get
> a segmentation fault:
> 
> eje-code:~ # crash /boot/vmlinux-4.1.34-33-debug.gz
> /var/crash/2016-10-31-17\:01//vmcore
> 
> crash 7.1.6
> Copyright (C) 2002-2016 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005, 2011 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for details.
> 
> GNU gdb (GDB) 7.6
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
> 
> KERNEL: /boot/vmlinux-4.1.34-33-debug.gz
> DUMPFILE: /var/crash/2016-10-31-17:01//vmcore
> CPUS: 4
> DATE: Mon Oct 31 13:01:36 2016
> UPTIME: 02:12:08
> LOAD AVERAGE: 0.00, 0.00, 0.00
> TASKS: 204
> NODENAME: eje-code
> RELEASE: 4.1.34-33-debug
> VERSION: #1 SMP Thu Oct 20 08:03:29 UTC 2016 (fe18aba)
> MACHINE: x86_64 (2094 Mhz)
> MEMORY: 4 GB
> PANIC: "sysrq: SysRq : Trigger a crash"
> PID: 3260
> COMMAND: "crashtest"
> TASK: ffff88011a020550 [THREAD_INFO: ffff8800bcd98000]
> CPU: 3
> STATE: TASK_RUNNING (SYSRQ)
> 
> crash> extend /usr/lib64/crash/extensions/gcore.so
> /usr/lib64/crash/extensions/gcore.so: shared object loaded
> crash> gcore -f 0 -v 7 3260
> gcore: Opening file core.3260.crashtest ...
> gcore: done.
> gcore: Writing ELF header ...
> gcore: done.
> gcore: Retrieving and writing note information ...
> Segmentation fault
> 
> Sixty-four bytes of core get written before the segmentation fault (I'm
> guessing that's the ELF header). I can gcore some other processes (although
> I get many "gcore: WARNING: page fault at 7ffca6a5d000" errors). I tried
> this both with an echo from bash from the command line and a custom test
> program that just does a controlled crash in a function nested four deep.
> The segmentation fault sometimes causes a hang (which I can end with
> Ctrl-C).
> 
> It does the same thing if I specify the task address (in this case, "gcore
> ffff88011a020550"). I've tried it without any options, too, and with
> different combinations.
> 
> I obtained a core dump of gcore and this is my debugging session:
> 
> eje-code:~ # gdb /usr/lib64/crash/extensions/gcore.so
> /var/core/core.eje-code-crash-3074
> GNU gdb (GDB; openSUSE Leap 42.1) 7.11.1
> Copyright (C) 2016 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-suse-linux".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://bugs.opensuse.org/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /usr/lib64/crash/extensions/gcore.so...done.
> 
> warning: core file may not match specified executable file. [Not sure why
> ...]
> [New LWP 3074]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `crash /boot/vmlinux-4.1.34-33-debug.gz
> /var/crash/2016-10-31-17:01//vmcore'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x0000000000000000 in ?? ()
> Missing separate debuginfos, use: zypper install
> glibc-debuginfo-2.19-17.4.x86_64 liblzma5-debuginfo-5.0.5-3.5.x86_64
> libncurses5-debuginfo-5.9-53.4.x86_64 libz1-debuginfo-1.2.8-6.4.x86_64
> (gdb) bt
> #0 0x0000000000000000 in ?? ()
> #1 0x00007f1235eed4e4 in restore_regs_syscall_context (target=0x6939df8,
> regs=0xf6f280, active_regs=0x7ffefa968880)
> at libgcore/gcore_x86.c:1656
> #2 0x00007f1235eedcb6 in genregs_get (target=0x6939df8, regset=0x7f12360f6460
> <x86_64_regsets>, size=216,
> buf=0xf6f280) at libgcore/gcore_x86.c:1795
> #3 0x00007f1235ee6438 in fill_write_thread_core_info (fp=0x59efb10,
> tc=0x6939df8, dump_tc=0x6939df8, info=0xf6ee80,
> view=0x7f12360f5d80 <x86_64_regset_view>, offset=0x7ffefa968ab0,
> total=0xf6ee98) at libgcore/gcore_coredump.c:469
> #4 0x00007f1235ee682c in fill_write_note_info (fp=0x59efb10, info=0xf6ee80,
> phnum=20, offset=0x7ffefa968ab0)
> at libgcore/gcore_coredump.c:566
> #5 0x00007f1235ee4dd1 in gcore_coredump () at libgcore/gcore_coredump.c:112
> #6 0x00007f1235eeeb8b in do_gcore (arg=0x0) at gcore.c:317
> #7 0x00007f1235eee926 in cmd_gcore () at gcore.c:253
> #8 0x0000000000472b8c in ?? ()
> #9 0x0000000000000000 in ?? ()
> (gdb) bt
> #0 0x0000000000000000 in ?? ()
> #1 0x00007f1235eed4e4 in restore_regs_syscall_context (target=0x6939df8,
> regs=0xf6f280, active_regs=0x7ffefa968880)
> at libgcore/gcore_x86.c:1656
> #2 0x00007f1235eedcb6 in genregs_get (target=0x6939df8, regset=0x7f12360f6460
> <x86_64_regsets>, size=216,
> buf=0xf6f280) at libgcore/gcore_x86.c:1795
> #3 0x00007f1235ee6438 in fill_write_thread_core_info (fp=0x59efb10,
> tc=0x6939df8, dump_tc=0x6939df8, info=0xf6ee80,
> view=0x7f12360f5d80 <x86_64_regset_view>, offset=0x7ffefa968ab0,
> total=0xf6ee98) at libgcore/gcore_coredump.c:469
> #4 0x00007f1235ee682c in fill_write_note_info (fp=0x59efb10, info=0xf6ee80,
> phnum=20, offset=0x7ffefa968ab0)
> at libgcore/gcore_coredump.c:566
> #5 0x00007f1235ee4dd1 in gcore_coredump () at libgcore/gcore_coredump.c:112
> #6 0x00007f1235eeeb8b in do_gcore (arg=0x0) at gcore.c:317
> #7 0x00007f1235eee926 in cmd_gcore () at gcore.c:253
> #8 0x0000000000472b8c in ?? ()
> #9 0x0000000000000000 in ?? ()
> (gdb) up
> #1 0x00007f1235eed4e4 in restore_regs_syscall_context (target=0x6939df8,
> regs=0xf6f280, active_regs=0x7ffefa968880)
> at libgcore/gcore_x86.c:1656
> 1656 regs->sp = gxt->get_old_rsp(target->processor);
> (gdb) print gxt
> $1 = (struct gcore_x86_table *) 0x215ea0 <gcore_x86_table>
> (gdb) print *target
> $2 = {task = 18446612137045525840, thread_info = 18446612135482589184, pid =
> 3260, comm = "crashtest\000 at XI\215u H",
> processor = 3, ptask = 18446612137046565648, mm_struct =
> 18446612137048351232, tc_next = 0x0}
> (gdb) print *regs
> $3 = {r15 = 0, r14 = 2, r13 = 2, r12 = 34324496, bp = 2, bx = 4196186, r11 =
> 582, r10 = 140728806957456,
> r9 = 140048302249728, r8 = 34324720, ax = 18446744073709551578, cx =
> 140048297135408, dx = 2, si = 140048302292992,
> di = 3, orig_ax = 1, ip = 140048297135408, cs = 51, flags = 582, sp =
> 140728806957864, ss = 43, fs_base = 0,
> gs_base = 0, ds = 0, es = 0, fs = 0, gs = 0}
> (gdb) print *gxt
> $4 = {get_old_rsp = 0x0, get_thread_struct_fpu = 0x0,
> get_thread_struct_fpu_size = 0x0, is_special_syscall = 0x0,
> is_special_ia32_syscall = 0x0, tsk_used_math = 0x0}
> =============================
> 
> So not only is get_old_rsp zero, all the fields in gxt are zero.
> 
> Looks like a kernel support issue. This field is filled in by
> gcore_x86_table_register_get_old_rsp() which looks up four symbols in
> various forms, none of which exist in my kernel:
> 
> eje-code:~ # fgrep old_rsp /proc/kallsyms
> eje-code:~ # fgrep cpu_pda /proc/kallsyms
> eje-code:~ #
> 
> old_rsp did exist in openSUSE 12.1 and 13.1 (3.11.10-29 for the latter).
> 
> According to http://lists.openwall.net/linux-kernel/2015/03/17/766 old_rsp
> was renamed rsp_scratch. I don't know if the semantics changed -- it doesn't
> appear so -- but I added code to accept this symbol as an alternative and
> the core dump generates and works (I can see a correct backtrace). I do not
> warrant the work though. :-) Someone may want to review my work, and check
> the other functions and see if they are supposed to be zero. Since they
> haven't been invoked I don't know if they are supposed to be non-zero or
> not.
> 
> Here is the diff:
> 
> --- gcore_x86.c~ 2014-11-06 04:58:47.000000000 -0500
> +++ gcore_x86.c 2016-10-31 16:01:00.989025841 -0400
> @@ -1351,6 +1351,26 @@ static ulong gcore_x86_64_get_old_rsp(in
> }
> 
> /**
> + * gcore_x86_64_get_rsp_scratch() - get rsp at per-cpu area
> + *
> + * @cpu target CPU's CPU id
> + *
> + * Given a CPU id, returns a RSP value saved at per-cpu area for the
> + * CPU whose id is the given CPU id.
> + */
> +static ulong gcore_x86_64_get_rsp_scratch(int cpu)
> +{
> + ulong old_rsp;
> +
> + readmem(symbol_value("rsp_scratch") + kt->__per_cpu_offset[cpu],
> + KVADDR, &old_rsp, sizeof(old_rsp),
> + "gcore_x86_64_get_rsp_scratch: rsp_scratch",
> + gcore_verbose_error_handle());
> +
> + return old_rsp;
> +}
> +
> +/**
> * gcore_x86_64_get_per_cpu__old_rsp() - get rsp at per-cpu area
> *
> * @cpu target CPU's CPU id
> @@ -1834,6 +1854,11 @@ static void gcore_x86_table_register_get
> 
> else if (symbol_exists("_cpu_pda"))
> gxt->get_old_rsp = gcore_x86_64_get_cpu__pda_oldrsp;
> +
> + else if (symbol_exists("rsp_scratch"))
> + gxt->get_old_rsp = gcore_x86_64_get_rsp_scratch;
> +
> + if (!gxt->get_old_rsp) printf ("Warning: NO gxt->get_old_rsp\n");
> }
> #endif
> 
> 
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility




More information about the Crash-utility mailing list