[Crash-utility] [ANNOUNCE][RFC] gcore extension module: user-mode process core dump

Tue Jan 18 03:14:02 UTC 2011

gcore extension module provides a means to create ELF core dump for
user-mode process that is contained within crash kernel dump. I design
this to behave as kernel's ELF core dumper.

For previous discussion, see:
https://www.redhat.com/archives/crash-utility/2010-August/msg00001.html

Compared with the previous version, this release:
 - supports more kernel versions, and
 - collects register values more accurately (but still not perfect).

Support Range
=============

|----------------+----------------------------------------------|
| ARCH           | X86, X86_64                                  |
|----------------+----------------------------------------------|
| Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
|----------------+----------------------------------------------|

TODO
====

I have still remaining tasks to do:
 - Improvement on register collection for active tasks
 - Improvement on callee-saved register collection on x86_64
 - Support core dump for tasks running in x86_32 compatibility mode

Usage
=====

1) Expand source files under extensions directory.

Arrange the attached source files as shown below:

./extensions/gcore.c
./extensions/gcore.mk
./extensions/libgcore/gcore_coredump.c
./extensions/libgcore/gcore_coredump_table.c
./extensions/libgcore/gcore_defs.h
./extensions/libgcore/gcore_dumpfilter.c
./extensions/libgcore/gcore_global_data.c
./extensions/libgcore/gcore_regset.c
./extensions/libgcore/gcore_verbose.c
./extensions/libgcore/gcore_x86.c

2) Type ``make extensions''; then, ``gcore.so'' is generated under
extensions directory.

3) Type ``extend gcore.so'' to load gcore extension module.

Look at help message for actual usage: I attach the help message at
the end of this mail.

4) Type ``extend -u gcore.so'' to unload gcore extension module.

Help Message
============

NAME
  gcore - gcore - retrieve a process image as a core dump

SYNOPSIS
  gcore 
  gcore [-v vlevel] [-f filter] [pid | taskp]*
  This command retrieves a process image as a core dump.

DESCRIPTION

    -v Display verbose information according to vlevel:

           progress  library error  page fault
       ---------------------------------------
         0
         1    x
         2                  x
         4                                x    (default)
         7    x             x             x

    -f Specify kinds of memory to be written into core dumps according to
       the filter flag in bitwise:

           AP  AS  FP  FS  ELF HP  HS
       ------------------------------
         0
         1  x
         2      x
         4          x
         8              x
        16          x       x
        32                      x
        64                          x
       127  x   x   x   x   x   x   x

        AP  Anonymous Private Memory
        AS  Anonymous Shared Memory
        FP  File-Backed Private Memory
        FS  File-Backed Shared Memory
        ELF ELF header pages in file-backed private memory areas
        HP  Hugetlb Private Memory
        HS  Hugetlb Shared Memory

  If no pid or taskp is specified, gcore tries to retrieve the process image
  of the current task context.

  The file name of a generated core dump is core.<pid> where pid is PID of
  the specified process.

  For a multi-thread process, gcore generates a core dump containing
  information for all threads, which is similar to a behaviour of the ELF
  core dumper in Linux kernel.

  Notice the difference of PID on between crash and linux that ps command in
  crash utility displays LWP, while ps command in Linux thread group tid,
  precisely PID of the thread group leader.

  gcore provides core dump filtering facility to allow users to select what
  kinds of memory maps to be included in the resulting core dump. There are
  7 kinds memory maps in total, and you can set it up with set command.
  For more detailed information, please see a help command message.

EXAMPLES
  Specify the process you want to retrieve as a core dump. Here assume the
  process with PID 12345.

    crash> gcore 12345
    Saved core.12345
    crash>

  Next, specify by TASK. Here assume the process placing at the address
  f9d7000 with PID 32323.

    crash> gcore f9d78000
    Saved core.32323
    crash>

  If multiple arguments are given, gcore performs dumping process in the
  order the arguments are given.

    crash> gcore 5217 ffff880136d72040 23299 24459 ffff880136420040
    Saved core.5217
    Saved core.1130
    Saved core.1130
    Saved core.24459
    Saved core.30102
    crash>

  If no argument is given, gcore tries to retrieve the process of the current
  task context.

    crash> set
         PID: 54321
     COMMAND: "bash"
        TASK: e0000040f80c0000
         CPU: 0
       STATE: TASK_INTERRUPTIBLE
    crash> gcore
    Saved core.54321

  When a multi-thread process is specified, the generated core file name has
  the thread leader's PID; here it is assumed to be 12340.

    crash> gcore 12345
    Saved core.12340

  It is not allowed to specify two same options at the same time.

    crash> gcore -v 1 1234 -v 1
    Usage: gcore
      gcore [-v vlevel] [-f filter] [pid | taskp]*
      gcore -d
    Enter "help gcore" for details.

  It is allowed to specify -v and -f options in a different order.

    crash> gcore -v 2 5201 -f 21 ffff880126ff9520 5205
    Saved core.5174
    Saved core.5217
    Saved core.5167
    crash> gcore 5201 ffff880126ff9520 -f 21 5205 -v 2
    Saved core.5174
    Saved core.5217
    Saved core.5167

Signed-off-by: HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com>
-------------- next part --------------
/* gcore.c -- core analysis suite
 *
 * Copyright (C) 2010 FUJITSU LIMITED
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */

#include "defs.h"
#include <gcore_defs.h>
#include <stdint.h>
#include <elf.h>

static void gcore_offset_table_init(void);
static void gcore_size_table_init(void);

static void do_gcore(char *arg);
static void do_setup_gcore(struct task_context *tc);
static void do_clean_gcore(void);

static struct command_table_entry command_table[] = {
	{ "gcore", cmd_gcore, help_gcore, 0 },
#ifdef GCORE_TEST
	{ "gcore_test", cmd_gcore_test, help_gcore_test, 0 },
#endif
	{ (char *)NULL }                               
};

int 
_init(void) /* Register the command set. */
{
	gcore_offset_table_init();
	gcore_size_table_init();
	gcore_coredump_table_init();
	gcore_arch_table_init();
	gcore_arch_regsets_init();
        register_extension(command_table);
	return 1;
}

int 
_fini(void) 
{ 
	return 1;
}

char *help_gcore[] = {
"gcore",
"gcore - retrieve a process image as a core dump",
"\n"
"  gcore [-v vlevel] [-f filter] [pid | taskp]*\n"
"  This command retrieves a process image as a core dump.",
"  ",
"    -v Display verbose information according to vlevel:",
"  ",
"           progress  library error  page fault",
"       ---------------------------------------",
"         0",
"         1    x",
"         2                  x",
"         4                                x    (default)",
"         7    x             x             x",
"  ",
"    -f Specify kinds of memory to be written into core dumps according to",
"       the filter flag in bitwise:",
"  ",
"           AP  AS  FP  FS  ELF HP  HS",
"       ------------------------------",
"         0",
"         1  x",
"         2      x",
"         4          x",
"         8              x",
"        16          x       x",
"        32                      x",
"        64                          x",
"       127  x   x   x   x   x   x   x",
" ",
"        AP  Anonymous Private Memory",
"        AS  Anonymous Shared Memory",
"        FP  File-Backed Private Memory",
"        FS  File-Backed Shared Memory",
"        ELF ELF header pages in file-backed private memory areas",
"        HP  Hugetlb Private Memory",
"        HS  Hugetlb Shared Memory",
"  ",
"  If no pid or taskp is specified, gcore tries to retrieve the process image",
"  of the current task context.",
"  ",
"  The file name of a generated core dump is core.<pid> where pid is PID of",
"  the specified process.",
"  ",
"  For a multi-thread process, gcore generates a core dump containing",
"  information for all threads, which is similar to a behaviour of the ELF",
"  core dumper in Linux kernel.",
"  ",
"  Notice the difference of PID on between crash and linux that ps command in",
"  crash utility displays LWP, while ps command in Linux thread group tid,",
"  precisely PID of the thread group leader.",
"  ",
"  gcore provides core dump filtering facility to allow users to select what",
"  kinds of memory maps to be included in the resulting core dump. There are",
"  7 kinds memory maps in total, and you can set it up with set command.",
"  For more detailed information, please see a help command message.",
"  ",
"EXAMPLES",
"  Specify the process you want to retrieve as a core dump. Here assume the",
"  process with PID 12345.",
"  ",
"    crash> gcore 12345",
"    Saved core.12345",
"    crash>",
"  ",
"  Next, specify by TASK. Here assume the process placing at the address",
"  f9d7000 with PID 32323.",
"  ",
"    crash> gcore f9d78000",
"    Saved core.32323",
"    crash>",
"  ",
"  If multiple arguments are given, gcore performs dumping process in the",
"  order the arguments are given.",
"  ",
"    crash> gcore 5217 ffff880136d72040 23299 24459 ffff880136420040",
"    Saved core.5217",
"    Saved core.1130",
"    Saved core.1130",
"    Saved core.24459",
"    Saved core.30102",
"    crash>",
"  ",
"  If no argument is given, gcore tries to retrieve the process of the current",
"  task context.",
"  ",
"    crash> set",
"         PID: 54321",
"     COMMAND: \"bash\"",
"        TASK: e0000040f80c0000",
"         CPU: 0",
"       STATE: TASK_INTERRUPTIBLE",
"    crash> gcore",
"    Saved core.54321",
"  ",
"  When a multi-thread process is specified, the generated core file name has",
"  the thread leader's PID; here it is assumed to be 12340.",
"  ",
"    crash> gcore 12345",
"    Saved core.12340",
"  ",
"  It is not allowed to specify two same options at the same time.",
"  ",
"    crash> gcore -v 1 1234 -v 1",
"    Usage: gcore",
"      gcore [-v vlevel] [-f filter] [pid | taskp]*",
"      gcore -d",
"    Enter \"help gcore\" for details.",
"  ",
"  It is allowed to specify -v and -f options in a different order.",
"  ",
"    crash> gcore -v 2 5201 -f 21 ffff880126ff9520 5205",
"    Saved core.5174",
"    Saved core.5217",
"    Saved core.5167",
"    crash> gcore 5201 ffff880126ff9520 -f 21 5205 -v 2",
"    Saved core.5174",
"    Saved core.5217",
"    Saved core.5167",
"  ",
NULL,
};

void
cmd_gcore(void)
{
	char c, *foptarg, *voptarg;

	if (ACTIVE())
		error(FATAL, "no support on live kernel");

	gcore_dumpfilter_set_default();
	gcore_verbose_set_default();

	foptarg = voptarg = NULL;

	while ((c = getopt(argcnt, args, "df:v:")) != EOF) {
		switch (c) {

		case 'f':
			if (foptarg)
				goto argerr;
			foptarg = optarg;
			break;
		case 'v':
			if (voptarg)
				goto argerr;
			voptarg = optarg;
			break;
		default:
		argerr:
			argerrs++;
			break;
		}
	}

	if (argerrs) {
		cmd_usage(pc->curcmd, SYNOPSIS);
	}

	if (foptarg) {
		ulong value;

		if (!decimal(foptarg, 0))
			error(FATAL, "filter must be a decimal: %s.\n",
			      foptarg);

		value = stol(foptarg, gcore_verbose_error_handle(), NULL);
		if (!gcore_dumpfilter_set(value))
			error(FATAL, "invalid filter value: %s.\n", foptarg);
	}

	if (voptarg) {
		ulong value;

		if (!decimal(voptarg, 0))
			error(FATAL, "vlevel must be a decimal: %s.\n",
			      voptarg);

		value = stol(voptarg, gcore_verbose_error_handle(), NULL);
		if (!gcore_verbose_set(value))
			error(FATAL, "invalid vlevel: %s.\n", voptarg);

	}

	if (!args[optind]) {
		do_gcore(NULL);
		return;
	}

	for (; args[optind]; optind++) {
		do_gcore(args[optind]);
		free_all_bufs();
	}

}

/**
 * do_gcore - do process core dump for a given task
 *
 * @arg string that refers to PID or task context's address
 *
 * Given the string, arg, refering to PID or task context's address,
 * do_gcore tries to do process coredump for the corresponding
 * task. If the string given is NULL, do_gcore does the process dump
 * for the current task context.
 *
 * Here is the unique exception point in gcore sub-command. Any fatal
 * action during gcore sub-command will come back here. Look carefully
 * at how IN_FOREACH is used here.
 *
 * Dynamic allocation in gcore sub-command fully depends on buffer
 * mechanism provided by crash utility. do_gcore() never makes freeing
 * operation. Thus, it is necessary to call free_all_bufs() each time
 * calling do_gcore(). See the end of cmd_gcore().
 */
static void do_gcore(char *arg)
{
	if (!setjmp(pc->foreach_loop_env)) {
		struct task_context *tc;
		ulong dummy;

		pc->flags |= IN_FOREACH;

		if (arg) {
			if (!IS_A_NUMBER(arg))
				error(FATAL, "neither pid nor taskp: %s\n",
				      args[optind]);

			if (STR_INVALID == str_to_context(arg, &dummy, &tc))
				error(FATAL, "invalid task or pid: %s\n",
				      args[optind]);
		} else
			tc = CURRENT_CONTEXT();

		if (is_kernel_thread(tc->task))
			error(FATAL, "The specified task is a kernel thread.\n");

		do_setup_gcore(tc);
		gcore_coredump();
	}
	pc->flags &= ~IN_FOREACH;
	do_clean_gcore();
}

/**
 * do_setup_gcore - initialize resources used for process core dump
 *
 * @tc task context object to be dumped from now on
 *
 * The resources used for process core dump is characterized by struct
 * gcore_data. Look carefully at the definition.
 */
static void do_setup_gcore(struct task_context *tc)
{
	gcore->flags = 0UL;
	gcore->fd = 0;

	if (tc != CURRENT_CONTEXT()) {
		gcore->orig = CURRENT_CONTEXT();
		(void) set_context(tc->task, tc->pid);
	}

	snprintf(gcore->corename, CORENAME_MAX_SIZE + 1, "core.%lu.%s",
		 task_tgid(CURRENT_TASK()), CURRENT_COMM());
}

/**
 * do_clean_gcore - clean up resources used for process core dump
 */
static void do_clean_gcore(void)
{
	if (gcore->fd > 0)
		close(gcore->fd);
	if (gcore->flags & GCF_UNDER_COREDUMP) {
		if (gcore->flags & GCF_SUCCESS)
			fprintf(fp, "Saved %s\n", gcore->corename);
		else
			fprintf(fp, "Failed.\n");
	}
	if (gcore->orig)
		(void)set_context(gcore->orig->task, gcore->orig->pid);
}

static void gcore_offset_table_init(void)
{
	GCORE_MEMBER_OFFSET_INIT(cpuinfo_x86_x86_capability, "cpuinfo_x86", "x86_capability");
	GCORE_MEMBER_OFFSET_INIT(cred_gid, "cred", "gid");
	GCORE_MEMBER_OFFSET_INIT(cred_uid, "cred", "uid");
	GCORE_MEMBER_OFFSET_INIT(desc_struct_base0, "desc_struct", "base0");
	GCORE_MEMBER_OFFSET_INIT(desc_struct_base1, "desc_struct", "base1");
	GCORE_MEMBER_OFFSET_INIT(desc_struct_base2, "desc_struct", "base2");
	GCORE_MEMBER_OFFSET_INIT(fpu_state, "fpu", "state");
	GCORE_MEMBER_OFFSET_INIT(inode_i_nlink, "inode", "i_nlink");
	GCORE_MEMBER_OFFSET_INIT(nsproxy_pid_ns, "nsproxy", "pid_ns");
	GCORE_MEMBER_OFFSET_INIT(mm_struct_arg_start, "mm_struct", "arg_start");
	GCORE_MEMBER_OFFSET_INIT(mm_struct_arg_end, "mm_struct", "arg_end");
	GCORE_MEMBER_OFFSET_INIT(mm_struct_map_count, "mm_struct", "map_count");
	GCORE_MEMBER_OFFSET_INIT(mm_struct_saved_auxv, "mm_struct", "saved_auxv");
	GCORE_MEMBER_OFFSET_INIT(pid_level, "pid", "level");
	GCORE_MEMBER_OFFSET_INIT(pid_namespace_level, "pid_namespace", "level");
        if (MEMBER_EXISTS("pt_regs", "ax"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_ax, "pt_regs", "ax");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_ax, "pt_regs", "eax");
        if (MEMBER_EXISTS("pt_regs", "bp"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_bp, "pt_regs", "bp");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_bp, "pt_regs", "ebp");
        if (MEMBER_EXISTS("pt_regs", "bx"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_bx, "pt_regs", "bx");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_bx, "pt_regs", "ebx");
        if (MEMBER_EXISTS("pt_regs", "cs"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_cs, "pt_regs", "cs");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_cs, "pt_regs", "xcs");
        if (MEMBER_EXISTS("pt_regs", "cx"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_cx, "pt_regs", "cx");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_cx, "pt_regs", "ecx");
        if (MEMBER_EXISTS("pt_regs", "di"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_di, "pt_regs", "di");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_di, "pt_regs", "edi");
        if (MEMBER_EXISTS("pt_regs", "ds"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_ds, "pt_regs", "ds");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_ds, "pt_regs", "xds");
        if (MEMBER_EXISTS("pt_regs", "dx"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_dx, "pt_regs", "dx");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_dx, "pt_regs", "edx");
        if (MEMBER_EXISTS("pt_regs", "es"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_es, "pt_regs", "es");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_es, "pt_regs", "xes");
        if (MEMBER_EXISTS("pt_regs", "flags"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_flags, "pt_regs", "flags");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_flags, "pt_regs", "eflags");
	GCORE_MEMBER_OFFSET_INIT(pt_regs_fs, "pt_regs", "fs");
	GCORE_MEMBER_OFFSET_INIT(pt_regs_gs, "pt_regs", "gs");
        if (MEMBER_EXISTS("pt_regs", "ip"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_ip, "pt_regs", "ip");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_ip, "pt_regs", "eip");
        if (MEMBER_EXISTS("pt_regs", "orig_eax"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_orig_ax, "pt_regs", "orig_eax");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_orig_ax, "pt_regs", "orig_ax");
        if (MEMBER_EXISTS("pt_regs", "si"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_si, "pt_regs", "si");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_si, "pt_regs", "esi");
        if (MEMBER_EXISTS("pt_regs", "sp"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_sp, "pt_regs", "sp");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_sp, "pt_regs", "esp");
        if (MEMBER_EXISTS("pt_regs", "ss"))
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_ss, "pt_regs", "ss");
        else
	  GCORE_MEMBER_OFFSET_INIT(pt_regs_ss, "pt_regs", "xss");
	GCORE_MEMBER_OFFSET_INIT(pt_regs_xfs, "pt_regs", "xfs");
	GCORE_MEMBER_OFFSET_INIT(pt_regs_xgs, "pt_regs", "xgs");
	GCORE_MEMBER_OFFSET_INIT(sched_entity_sum_exec_runtime, "sched_entity", "sum_exec_runtime");
	GCORE_MEMBER_OFFSET_INIT(signal_struct_cutime, "signal_struct", "cutime");
	GCORE_MEMBER_OFFSET_INIT(signal_struct_pgrp, "signal_struct", "pgrp");
	GCORE_MEMBER_OFFSET_INIT(signal_struct_session, "signal_struct", "session");
	GCORE_MEMBER_OFFSET_INIT(signal_struct_stime, "signal_struct", "stime");
	GCORE_MEMBER_OFFSET_INIT(signal_struct_sum_sched_runtime, "signal_struct", "sum_sched_runtime");
	GCORE_MEMBER_OFFSET_INIT(signal_struct_utime, "signal_struct", "utime");
	GCORE_MEMBER_OFFSET_INIT(task_struct_cred, "task_struct", "cred");
	GCORE_MEMBER_OFFSET_INIT(task_struct_gid, "task_struct", "gid");
	GCORE_MEMBER_OFFSET_INIT(task_struct_group_leader, "task_struct", "group_leader");
	GCORE_MEMBER_OFFSET_INIT(task_struct_real_cred, "task_struct", "real_cred");
	if (MEMBER_EXISTS("task_struct", "real_parent"))
		GCORE_MEMBER_OFFSET_INIT(task_struct_real_parent, "task_struct", "real_parent");
	else if (MEMBER_EXISTS("task_struct", "parent"))
		GCORE_MEMBER_OFFSET_INIT(task_struct_real_parent, "task_struct", "parent");
	GCORE_MEMBER_OFFSET_INIT(task_struct_se, "task_struct", "se");
	GCORE_MEMBER_OFFSET_INIT(task_struct_static_prio, "task_struct", "static_prio");
	GCORE_MEMBER_OFFSET_INIT(task_struct_uid, "task_struct", "uid");
	GCORE_MEMBER_OFFSET_INIT(task_struct_used_math, "task_struct", "used_math");
	GCORE_MEMBER_OFFSET_INIT(thread_info_status, "thread_info", "status");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_ds, "thread_struct", "ds");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_es, "thread_struct", "es");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_fs, "thread_struct", "fs");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_fsindex, "thread_struct", "fsindex");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_fpu, "thread_struct", "fpu");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_gs, "thread_struct", "gs");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_gsindex, "thread_struct", "gsindex");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_i387, "thread_struct", "i387");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_tls_array, "thread_struct", "tls_array");
	if (MEMBER_EXISTS("thread_struct", "usersp"))
		GCORE_MEMBER_OFFSET_INIT(thread_struct_usersp, "thread_struct", "usersp");
	else if (MEMBER_EXISTS("thread_struct", "userrsp"))
		GCORE_MEMBER_OFFSET_INIT(thread_struct_usersp, "thread_struct", "userrsp");
	if (MEMBER_EXISTS("thread_struct", "xstate"))
		GCORE_MEMBER_OFFSET_INIT(thread_struct_xstate, "thread_struct", "xstate");
	else if (MEMBER_EXISTS("thread_struct", "i387"))
		GCORE_MEMBER_OFFSET_INIT(thread_struct_xstate, "thread_struct", "i387");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_io_bitmap_max, "thread_struct", "io_bitmap_max");
	GCORE_MEMBER_OFFSET_INIT(thread_struct_io_bitmap_ptr, "thread_struct", "io_bitmap_ptr");
	GCORE_MEMBER_OFFSET_INIT(user_regset_n, "user_regset", "n");
	GCORE_MEMBER_OFFSET_INIT(vm_area_struct_anon_vma, "vm_area_struct", "anon_vma");

	if (symbol_exists("_cpu_pda"))
		GCORE_MEMBER_OFFSET_INIT(x8664_pda_oldrsp, "x8664_pda", "oldrsp");
}

static void gcore_size_table_init(void)
{
	GCORE_STRUCT_SIZE_INIT(i387_union, "i387_union");
	GCORE_MEMBER_SIZE_INIT(mm_struct_saved_auxv, "mm_struct", "saved_auxv");
	GCORE_MEMBER_SIZE_INIT(thread_struct_fs, "thread_struct", "fs");
	GCORE_MEMBER_SIZE_INIT(thread_struct_fsindex, "thread_struct", "fsindex");
	GCORE_MEMBER_SIZE_INIT(thread_struct_gs, "thread_struct", "gs");
	GCORE_MEMBER_SIZE_INIT(thread_struct_gsindex, "thread_struct", "gsindex");
	GCORE_MEMBER_SIZE_INIT(thread_struct_tls_array, "thread_struct", "tls_array");
	GCORE_STRUCT_SIZE_INIT(thread_xstate, "thread_xstate");
	GCORE_MEMBER_SIZE_INIT(vm_area_struct_anon_vma, "vm_area_struct", "anon_vma");

}

#ifdef GCORE_TEST

char *help_gcore_test[] = {
"gcore_test",
"gcore_test - test gcore",
"\n"
"  ",
NULL,
};

void cmd_gcore_test(void)
{
	char *message = NULL;

#define TEST_MODULE(test)					\
	message = test();					\
	if (message)						\
		fprintf(fp, #test ": %s\n", message);

	TEST_MODULE(gcore_x86_test);
	TEST_MODULE(gcore_coredump_table_test);
	TEST_MODULE(gcore_dumpfilter_test);

	if (!message)
		fprintf(fp, "All test cases are successfully passed\n");

#undef TEST_MODULE
}

#endif /* GCORE_TEST */
-------------- next part --------------
#
# Copyright (C) 2010 FUJITSU LIMITED
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#

ifeq ($(shell arch), i686)
  TARGET=X86
  TARGET_CFLAGS=-D_FILE_OFFSET_BITS=64
endif

ifeq ($(shell arch), x86_64)
  TARGET=X86_64
  TARGET_CFLAGS=
endif

ifeq ($(shell /bin/ls /usr/include/crash/defs.h 2>/dev/null), /usr/include/crash/defs.h)
  INCDIR=/usr/include/crash
endif
ifeq ($(shell /bin/ls ./defs.h 2> /dev/null), ./defs.h)
  INCDIR=.
endif
ifeq ($(shell /bin/ls ../defs.h 2> /dev/null), ../defs.h)
  INCDIR=..
endif

GCORE_CFILES = \
	libgcore/gcore_coredump.c \
	libgcore/gcore_coredump_table.c \
	libgcore/gcore_dumpfilter.c \
	libgcore/gcore_global_data.c \
	libgcore/gcore_regset.c \
	libgcore/gcore_verbose.c

ifneq (,$(findstring $(TARGET), X86 X86_64))
GCORE_CFILES += libgcore/gcore_x86.c
endif

GCORE_OFILES = $(patsubst %.c,%.o,$(GCORE_CFILES))

COMMON_CFLAGS=-Wall -I$(INCDIR) -I./libgcore -fPIC -D$(TARGET)

all: gcore.so

gcore.so: $(INCDIR)/defs.h gcore.c $(GCORE_OFILES)
	gcc $(TARGET_CFLAGS) $(COMMON_CFLAGS) -nostartfiles -shared -rdynamic $(GCORE_OFILES) -o gcore.so gcore.c

%.o: %.c $(INCDIR)/defs.h
	gcc $(TARGET_CFLAGS) $(COMMON_CFLAGS) -c -o $@ $<

clean:
	find ./libgcore -regex ".+\(o\|so\)" -exec rm -f {} \;

-------------- next part --------------
/* gcore_coredump.c -- core analysis suite
 *
 * Copyright (C) 2010 FUJITSU LIMITED
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */

#include <defs.h>
#include <gcore_defs.h>
#include <elf.h>

static void fill_prstatus(struct elf_prstatus *prstatus, ulong task,
			  const struct thread_group_list *tglist);
static void fill_psinfo(struct elf_prpsinfo *psinfo, ulong task);
static void fill_auxv_note(struct memelfnote *note, ulong task);
static int fill_thread_group(struct thread_group_list **tglist);
static void fill_headers(Elf_Ehdr *elf, Elf_Shdr *shdr0, int phnum,
			 uint16_t e_machine, uint32_t e_flags,
			 uint8_t ei_osabi);
static void fill_thread_core_info(struct elf_thread_core_info *t,
				  const struct user_regset_view *view,
				  size_t *total,
				  struct thread_group_list *tglist);
static int fill_note_info(struct elf_note_info *info,
			  struct thread_group_list *tglist, Elf_Ehdr *elf,
			  Elf_Shdr *shdr0, int phnum);
static void fill_note(struct memelfnote *note, const char *name, int type,
		      unsigned int sz, void *data);

static int notesize(struct memelfnote *en);
static void alignfile(int fd, off_t *foffset);
static void write_elf_note_phdr(int fd, size_t size, off_t *offset);
static void writenote(struct memelfnote *men, int fd, off_t *foffset);
static void write_note_info(int fd, struct elf_note_info *info, off_t *foffset);
static size_t get_note_info_size(struct elf_note_info *info);
static ulong next_vma(ulong this_vma);

static inline int thread_group_leader(ulong task);

void gcore_coredump(void)
{
	struct thread_group_list *tglist = NULL;
	struct elf_note_info info;
	Elf_Ehdr elf;
	Elf_Shdr shdr0;
	int map_count, phnum;
	ulong vma, index, mmap;
	off_t offset, foffset, dataoff;
	char *mm_cache, *buffer = NULL;

	gcore->flags |= GCF_UNDER_COREDUMP;

	mm_cache = fill_mm_struct(task_mm(CURRENT_TASK(), TRUE));
	if (!mm_cache)
		error(FATAL, "The user memory space does not exist.\n");

	mmap = ULONG(mm_cache + OFFSET(mm_struct_mmap));
	map_count = INT(mm_cache + GCORE_OFFSET(mm_struct_map_count));

	progressf("Restoring the thread group ... \n");
	fill_thread_group(&tglist);
	progressf("done.\n");

	phnum = map_count;
	phnum++; /* for note information */

	progressf("Retrieving note information ... \n");
	fill_note_info(&info, tglist, &elf, &shdr0, phnum);
	progressf("done.\n");

	progressf("Opening file %s ... \n", gcore->corename);
	gcore->fd = open(gcore->corename, O_WRONLY|O_TRUNC|O_CREAT,
			 S_IRUSR|S_IWUSR);
	if (gcore->fd < 0)
		error(FATAL, "%s: open: %s\n", gcore->corename,
		      strerror(errno));
	progressf("done.\n");

	progressf("Writing ELF header ... \n");
	if (write(gcore->fd, &elf, sizeof(elf)) != sizeof(elf))
		error(FATAL, "%s: write: %s\n", gcore->corename,
		      strerror(errno));
	progressf(" done.\n");

	if (elf.e_shoff) {
		progressf("Writing section header table ... \n");
		if (write(gcore->fd, &shdr0, sizeof(shdr0)) != sizeof(shdr0))
			error(FATAL, "%s: gcore: %s\n", gcore->corename,
			      strerror(errno));
		progressf("done.\n");
	}

	offset = elf.e_ehsize +
		(elf.e_phnum == PN_XNUM ? elf.e_shnum * elf.e_shentsize : 0) +
		phnum * elf.e_phentsize;
	foffset = offset;

	progressf("Writing PT_NOTE program header ... \n");
	write_elf_note_phdr(gcore->fd, get_note_info_size(&info), &offset);
	progressf("done.\n");

	dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);

	progressf("Writing PT_LOAD program headers ... \n");
	FOR_EACH_VMA_OBJECT(vma, index, mmap) {
		char *vma_cache;
		ulong vm_start, vm_end, vm_flags;
		Elf_Phdr phdr;

		vma_cache = fill_vma_cache(vma);
		vm_start = ULONG(vma_cache + OFFSET(vm_area_struct_vm_start));
		vm_end   = ULONG(vma_cache + OFFSET(vm_area_struct_vm_end));
		vm_flags = ULONG(vma_cache + OFFSET(vm_area_struct_vm_flags));

		phdr.p_type = PT_LOAD;
		phdr.p_offset = offset;
		phdr.p_vaddr = vm_start;
		phdr.p_paddr = 0;
		phdr.p_filesz = gcore_dumpfilter_vma_dump_size(vma);
		phdr.p_memsz = vm_end - vm_start;
		phdr.p_flags = vm_flags & VM_READ ? PF_R : 0;
		if (vm_flags & VM_WRITE)
			phdr.p_flags |= PF_W;
		if (vm_flags & VM_EXEC)
			phdr.p_flags |= PF_X;
		phdr.p_align = ELF_EXEC_PAGESIZE;

		offset += phdr.p_filesz;

		if (write(gcore->fd, &phdr, sizeof(phdr)) != sizeof(phdr))
			error(FATAL, "%s: write, %s\n", gcore->corename,
			      strerror(errno));
	}
	progressf("done.\n");

	progressf("Writing PT_NOTE segment ... \n");
	write_note_info(gcore->fd, &info, &foffset);
	progressf("done.\n");

	buffer = GETBUF(PAGE_SIZE);
	BZERO(buffer, PAGE_SIZE);

	{
		size_t len;

		len = dataoff - foffset;
		if ((size_t)write(gcore->fd, buffer, len) != len)
			error(FATAL, "%s: write: %s\n", gcore->corename,
			      strerror(errno));
	}

	progressf("Writing PT_LOAD segment ... \n");
	FOR_EACH_VMA_OBJECT(vma, index, mmap) {
		ulong addr, end, vm_start;

		vm_start = ULONG(fill_vma_cache(vma) +
				 OFFSET(vm_area_struct_vm_start));

		end = vm_start + gcore_dumpfilter_vma_dump_size(vma);

		progressf("PT_LOAD[%lu]: %lx - %lx\n", index, vm_start, end);

		for (addr = vm_start; addr < end; addr += PAGE_SIZE) {
			physaddr_t paddr;

			if (uvtop(CURRENT_CONTEXT(), addr, &paddr, FALSE)) {
				readmem(paddr, PHYSADDR, buffer, PAGE_SIZE,
					"readmem vma list",
					gcore_verbose_error_handle());
			} else {
				pagefaultf("page fault at %lx\n", addr);
				BZERO(buffer, PAGE_SIZE);
			}

			if (write(gcore->fd, buffer, PAGE_SIZE) != PAGE_SIZE)
				error(FATAL, "%s: write: %s\n", gcore->corename,
				      strerror(errno));

		}
	}
	progressf("done.\n");

	gcore->flags |= GCF_SUCCESS;

}

static inline int
thread_group_leader(ulong task)
{
	ulong group_leader;

	readmem(task + GCORE_OFFSET(task_struct_group_leader), KVADDR,
		&group_leader, sizeof(group_leader),
		"thread_group_leader: group_leader",
		gcore_verbose_error_handle());

	return task == group_leader;
}

static int
fill_thread_group(struct thread_group_list **tglist)
{
	ulong i;
	struct task_context *tc;
	struct thread_group_list *l;
	const uint tgid = task_tgid(CURRENT_TASK());
	const ulong lead_pid = CURRENT_PID();

	tc = FIRST_CONTEXT();
	l = NULL;
	for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
		if (task_tgid(tc->task) == tgid) {
			struct thread_group_list *new;

			new = (struct thread_group_list *)
				GETBUF(sizeof(struct thread_group_list));
			new->task = tc->task;
			if (tc->pid == lead_pid || !l) {
				new->next = l;
				l = new;
			} else if (l) {
				new->next = l->next;
				l->next = new;
			}
		}
	}
	*tglist = l;

	return 1;
}

static int
task_nice(ulong task)
{
	int static_prio;

	readmem(task + GCORE_OFFSET(task_struct_static_prio), KVADDR,
		&static_prio, sizeof(static_prio), "task_nice: static_prio",
		gcore_verbose_error_handle());

	return PRIO_TO_NICE(static_prio);
}

static void
fill_psinfo(struct elf_prpsinfo *psinfo, ulong task)
{
	ulong arg_start, arg_end, parent;
	physaddr_t paddr;
	long state, uid, gid;
        unsigned int i, len;
	char *mm_cache;

        /* first copy the parameters from user space */
	BZERO(psinfo, sizeof(struct elf_prpsinfo));

	mm_cache = fill_mm_struct(task_mm(task, FALSE));

	arg_start = ULONG(mm_cache + GCORE_OFFSET(mm_struct_arg_start));
	arg_end = ULONG(mm_cache + GCORE_OFFSET(mm_struct_arg_end));

        len = arg_end - arg_start;
        if (len >= ELF_PRARGSZ)
                len = ELF_PRARGSZ-1;
	if (uvtop(CURRENT_CONTEXT(), arg_start, &paddr, FALSE)) {
		readmem(paddr, PHYSADDR, &psinfo->pr_psargs, len,
			"fill_psinfo: pr_psargs", gcore_verbose_error_handle());
	} else {
		pagefaultf("page fault at %lx\n", arg_start);
	}
        for(i = 0; i < len; i++)
                if (psinfo->pr_psargs[i] == 0)
                        psinfo->pr_psargs[i] = ' ';
        psinfo->pr_psargs[len] = 0;

	readmem(task + GCORE_OFFSET(task_struct_real_parent), KVADDR,
		&parent, sizeof(parent), "fill_psinfo: real_parent",
		gcore_verbose_error_handle());

	psinfo->pr_ppid = ggt->task_pid(parent);
	psinfo->pr_pid = ggt->task_pid(task);
	psinfo->pr_pgrp = ggt->task_pgrp(task);
	psinfo->pr_sid = ggt->task_session(task);

	readmem(task + OFFSET(task_struct_state), KVADDR, &state, sizeof(state),
		"fill_psinfo: state", gcore_verbose_error_handle());

        i = state ? ffz(~state) + 1 : 0;
        psinfo->pr_state = i;
        psinfo->pr_sname = (i > 5) ? '.' : "RSDTZW"[i];
        psinfo->pr_zomb = psinfo->pr_sname == 'Z';

	psinfo->pr_nice = task_nice(task);

	readmem(task + OFFSET(task_struct_flags), KVADDR, &psinfo->pr_flag,
		sizeof(psinfo->pr_flag), "fill_psinfo: flags",
		gcore_verbose_error_handle());

	uid = ggt->task_uid(task);
	gid = ggt->task_gid(task);

	SET_UID(psinfo->pr_uid, (uid_t)uid);
	SET_GID(psinfo->pr_gid, (gid_t)gid);

	readmem(task + OFFSET(task_struct_comm), KVADDR, &psinfo->pr_fname,
		TASK_COMM_LEN, "fill_psinfo: comm",
		gcore_verbose_error_handle());

}

static void
fill_headers(Elf_Ehdr *elf, Elf_Shdr *shdr0, int phnum, uint16_t e_machine,
	     uint32_t e_flags, uint8_t ei_osabi)
{
	BZERO(elf, sizeof(Elf_Ehdr));
	BCOPY(ELFMAG, elf->e_ident, SELFMAG);
	elf->e_ident[EI_CLASS] = ELF_CLASS;
	elf->e_ident[EI_DATA] = ELF_DATA;
	elf->e_ident[EI_VERSION] = EV_CURRENT;
	elf->e_ident[EI_OSABI] = ei_osabi;
	elf->e_ehsize = sizeof(Elf_Ehdr);
	elf->e_phentsize = sizeof(Elf_Phdr);
	elf->e_phnum = phnum >= PN_XNUM ? PN_XNUM : phnum;
	if (elf->e_phnum == PN_XNUM) {
		elf->e_shoff = elf->e_ehsize;
		elf->e_shentsize = sizeof(Elf_Shdr);
		elf->e_shnum = 1;
		elf->e_shstrndx = SHN_UNDEF;
	}
	elf->e_type = ET_CORE;
	elf->e_machine = e_machine;
	elf->e_version = EV_CURRENT;
	elf->e_phoff = sizeof(Elf_Ehdr) + elf->e_shentsize * elf->e_shnum;
	elf->e_flags = e_flags;

	if (elf->e_phnum == PN_XNUM) {
		BZERO(shdr0, sizeof(Elf_Shdr));
		shdr0->sh_type = SHT_NULL;
		shdr0->sh_size = elf->e_shnum;
		shdr0->sh_link = elf->e_shstrndx;
		shdr0->sh_info = phnum;
	}

}

static void
fill_thread_core_info(struct elf_thread_core_info *t,
		      const struct user_regset_view *view, size_t *total,
		      struct thread_group_list *tglist)
{
	unsigned int i;

	/* NT_PRSTATUS is the one special case, because the regset data
	 * goes into the pr_reg field inside the note contents, rather
         * than being the whole note contents.  We fill the reset in here.
         * We assume that regset 0 is NT_PRSTATUS.
         */
	fill_prstatus(&t->prstatus, t->task, tglist);
        view->regsets[0].get(task_to_context(t->task), &view->regsets[0],
			     sizeof(t->prstatus.pr_reg), &t->prstatus.pr_reg);

        fill_note(&t->notes[0], "CORE", NT_PRSTATUS,
                  sizeof(t->prstatus), &t->prstatus);
        *total += notesize(&t->notes[0]);

	if (view->regsets[0].writeback)
		view->regsets[0].writeback(task_to_context(t->task),
					   &view->regsets[0], 1);

	for (i = 1; i < view->n; ++i) {
		const struct user_regset *regset = &view->regsets[i];
		void *data;

		if (regset->writeback)
			regset->writeback(task_to_context(t->task), regset, 1);
		if (!regset->core_note_type)
			continue;
		if (regset->active &&
		    !regset->active(task_to_context(t->task), regset))
			continue;
		data = (void *)GETBUF(regset->size);
		if (!regset->get(task_to_context(t->task), regset, regset->size,
				 data))
			continue;
		if (regset->callback)
			regset->callback(t, regset);

		fill_note(&t->notes[i], regset->name, regset->core_note_type,
			  regset->size, data);
		*total += notesize(&t->notes[i]);
	}

}

static int
fill_note_info(struct elf_note_info *info, struct thread_group_list *tglist,
	       Elf_Ehdr *elf, Elf_Shdr *shdr0, int phnum)
{
	const struct user_regset_view *view = task_user_regset_view();
	struct thread_group_list *l;
	struct elf_thread_core_info *t;
	struct elf_prpsinfo *psinfo = NULL;
	ulong dump_task;
	unsigned int i;

	info->size = 0;
	info->thread = NULL;

	psinfo = (struct elf_prpsinfo *)GETBUF(sizeof(struct elf_prpsinfo));
        fill_note(&info->psinfo, "CORE", NT_PRPSINFO,
		  sizeof(struct elf_prpsinfo), psinfo);

	info->thread_notes = 0;
	for (i = 0; i < view->n; i++)
		if (view->regsets[i].core_note_type != 0)
			++info->thread_notes;

	/* Sanity check.  We rely on regset 0 being in NT_PRSTATUS,
         * since it is our one special case.
         */
	if (info->thread_notes == 0 ||
	    view->regsets[0].core_note_type != NT_PRSTATUS)
		error(FATAL, "regset 0 is _not_ NT_PRSTATUS\n");

	fill_headers(elf, shdr0, phnum, view->e_machine, view->e_flags,
		     view->ei_osabi);

	/* head task is always a dump target */
	dump_task = tglist->task;

	for (l = tglist; l; l = l->next) {
		struct elf_thread_core_info *new;
		size_t entry_size;

		entry_size = offsetof(struct elf_thread_core_info,
				      notes[info->thread_notes]);
		new = (struct elf_thread_core_info *)GETBUF(entry_size);
		BZERO(new, entry_size);
		new->task = l->task;
		if (!info->thread || l->task == dump_task) {
			new->next = info->thread;
			info->thread = new;
		} else {
			/* keep dump_task in the head position */
			new->next = info->thread->next;
			info->thread->next = new;
		}
	}

	for (t = info->thread; t; t = t->next)
		fill_thread_core_info(t, view, &info->size, tglist);

        /*
	 * Fill in the two process-wide notes.
         */
        fill_psinfo(psinfo, dump_task);
        info->size += notesize(&info->psinfo);

	fill_auxv_note(&info->auxv, dump_task);
	info->size += notesize(&info->auxv);

	return 0;
}

static int
notesize(struct memelfnote *en)
{
        int sz;

        sz = sizeof(Elf_Nhdr);
        sz += roundup(strlen(en->name) + 1, 4);
        sz += roundup(en->datasz, 4);

        return sz;
}

static void
fill_note(struct memelfnote *note, const char *name, int type, unsigned int sz,
	  void *data)
{
        note->name = name;
        note->type = type;
	note->datasz = sz;
        note->data = data;
        return;
}

static void
alignfile(int fd, off_t *foffset)
{
        static const char buffer[4] = {};
	const size_t len = roundup(*foffset, 4) - *foffset;

	if ((size_t)write(fd, buffer, len) != len)
		error(FATAL, "%s: write %s\n", gcore->corename,
		      strerror(errno));
	*foffset += (off_t)len;
}

static void
writenote(struct memelfnote *men, int fd, off_t *foffset)
{
        const Elf_Nhdr en = {
		.n_namesz = strlen(men->name) + 1,
		.n_descsz = men->datasz,
		.n_type   = men->type,
	};

	if (write(fd, &en, sizeof(en)) != sizeof(en))
		error(FATAL, "%s: write %s\n", gcore->corename,
		      strerror(errno));
	*foffset += sizeof(en);

	if (write(fd, men->name, en.n_namesz) != en.n_namesz)
		error(FATAL, "%s: write %s\n", gcore->corename,
		      strerror(errno));
	*foffset += en.n_namesz;

        alignfile(fd, foffset);

	if (write(fd, men->data, men->datasz) != men->datasz)
		error(FATAL, "%s: write %s\n", gcore->corename,
		      strerror(errno));
	*foffset += men->datasz;

        alignfile(fd, foffset);

}

static void
write_note_info(int fd, struct elf_note_info *info, off_t *foffset)
{
        int first = 1;
        struct elf_thread_core_info *t = info->thread;

        do {
                int i;

                writenote(&t->notes[0], fd, foffset);

                if (first) {
			writenote(&info->psinfo, fd, foffset);
			writenote(&info->auxv, fd, foffset);
		}

                for (i = 1; i < info->thread_notes; ++i)
                        if (t->notes[i].data)
				writenote(&t->notes[i], fd, foffset);

                first = 0;
                t = t->next;
        } while (t);

}

static size_t
get_note_info_size(struct elf_note_info *info)
{
	return info->size;
}

static ulong next_vma(ulong this_vma)
{
	return ULONG(fill_vma_cache(this_vma) + OFFSET(vm_area_struct_vm_next));
}

static void
write_elf_note_phdr(int fd, size_t size, off_t *offset)
{
	Elf_Phdr phdr;

	BZERO(&phdr, sizeof(phdr));

        phdr.p_type = PT_NOTE;
        phdr.p_offset = *offset;
        phdr.p_filesz = size;

	*offset += size;

	if (write(fd, &phdr, sizeof(phdr)) != sizeof(phdr))
		error(FATAL, "%s: write: %s\n", gcore->corename,
		      strerror(errno));

}

static void
fill_prstatus(struct elf_prstatus *prstatus, ulong task,
	      const struct thread_group_list *tglist)
{
	ulong pending_signal_sig0, blocked_sig0, real_parent, group_leader,
		signal, cutime,	cstime;

        /* The type of (sig[0]) is unsigned long. */
	readmem(task + OFFSET(task_struct_pending) + OFFSET(sigpending_signal),
		KVADDR, &pending_signal_sig0, sizeof(unsigned long),
		"fill_prstatus: sigpending_signal_sig",
		gcore_verbose_error_handle());

	readmem(task + OFFSET(task_struct_blocked), KVADDR, &blocked_sig0,
		sizeof(unsigned long), "fill_prstatus: blocked_sig0",
		gcore_verbose_error_handle());

	readmem(task + OFFSET(task_struct_parent), KVADDR, &real_parent,
		sizeof(real_parent), "fill_prstatus: real_parent",
		gcore_verbose_error_handle());

	readmem(task + GCORE_OFFSET(task_struct_group_leader), KVADDR,
		&group_leader, sizeof(group_leader),
		"fill_prstatus: group_leader", gcore_verbose_error_handle());

	prstatus->pr_info.si_signo = prstatus->pr_cursig = 0;
        prstatus->pr_sigpend = pending_signal_sig0;
        prstatus->pr_sighold = blocked_sig0;
        prstatus->pr_ppid = ggt->task_pid(real_parent);
        prstatus->pr_pid = ggt->task_pid(task);
        prstatus->pr_pgrp = ggt->task_pgrp(task);
        prstatus->pr_sid = ggt->task_session(task);
        if (thread_group_leader(task)) {
                struct task_cputime cputime;

                /*
                 * This is the record for the group leader.  It shows the
                 * group-wide total, not its individual thread total.
                 */
                ggt->thread_group_cputime(task, tglist, &cputime);
                cputime_to_timeval(cputime.utime, &prstatus->pr_utime);
                cputime_to_timeval(cputime.stime, &prstatus->pr_stime);
        } else {
		cputime_t utime, stime;

		readmem(task + OFFSET(task_struct_utime), KVADDR, &utime,
			sizeof(utime), "task_struct utime", gcore_verbose_error_handle());

		readmem(task + OFFSET(task_struct_stime), KVADDR, &stime,
			sizeof(stime), "task_struct stime", gcore_verbose_error_handle());

                cputime_to_timeval(utime, &prstatus->pr_utime);
                cputime_to_timeval(stime, &prstatus->pr_stime);
        }

	readmem(task + OFFSET(task_struct_signal), KVADDR, &signal,
		sizeof(signal), "task_struct signal", gcore_verbose_error_handle());

	readmem(task + GCORE_OFFSET(signal_struct_cutime), KVADDR,
		&cutime, sizeof(cutime), "signal_struct cutime",
		gcore_verbose_error_handle());

	readmem(task + GCORE_OFFSET(signal_struct_cutime), KVADDR,
		&cstime, sizeof(cstime), "signal_struct cstime",
		gcore_verbose_error_handle());

        cputime_to_timeval(cutime, &prstatus->pr_cutime);
        cputime_to_timeval(cstime, &prstatus->pr_cstime);

}

static void
fill_auxv_note(struct memelfnote *note, ulong task)
{
	ulong *auxv;
	int i;

	auxv = (ulong *)GETBUF(GCORE_SIZE(mm_struct_saved_auxv));

	readmem(task_mm(task, FALSE) +
		GCORE_OFFSET(mm_struct_saved_auxv), KVADDR, auxv,
		GCORE_SIZE(mm_struct_saved_auxv), "fill_auxv_note",
		gcore_verbose_error_handle());

	i = 0;
	do
		i += 2;
	while (auxv[i] != AT_NULL);

	fill_note(note, "CORE", NT_AUXV, i * sizeof(ulong), auxv);

}
-------------- next part --------------
/* gcore_coredump_table.c -- core analysis suite
 *
 * Copyright (C) 2010 FUJITSU LIMITED
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */

#include <defs.h>
#include <gcore_defs.h>

static unsigned int get_inode_i_nlink_v0(ulong file);
static unsigned int get_inode_i_nlink_v19(ulong file);
static pid_t pid_nr_ns(ulong pid, ulong ns);
static int pid_alive(ulong task);
static int __task_pid_nr_ns(ulong task, enum pid_type type);
static inline pid_t task_pid(ulong task);
static inline pid_t process_group(ulong task);
static inline pid_t task_session(ulong task);
static inline pid_t task_pid_vnr(ulong task);
static inline pid_t task_pgrp_vnr(ulong task);
static inline pid_t task_session_vnr(ulong task);
static void
thread_group_cputime_v0(ulong task, const struct thread_group_list *threads,
			struct task_cputime *cputime);
static void
thread_group_cputime_v22(ulong task, const struct thread_group_list *threads,
			 struct task_cputime *cputime);
static inline __kernel_uid_t task_uid_v0(ulong task);
static inline __kernel_uid_t task_uid_v28(ulong task);
static inline __kernel_gid_t task_gid_v0(ulong task);
static inline __kernel_gid_t task_gid_v28(ulong task);

void gcore_coredump_table_init(void)
{
	/*
         * struct path was introduced at v2.6.19, where f_dentry
         * member of struct file was replaced by f_path member.
	 *
	 * See vfs_init() to know why this condition is chosen.
	 *
	 * See commit 0f7fc9e4d03987fe29f6dd4aa67e4c56eb7ecb05.
	 */
	if (VALID_MEMBER(file_f_path))
		ggt->get_inode_i_nlink = get_inode_i_nlink_v19;
	else
		ggt->get_inode_i_nlink = get_inode_i_nlink_v0;

	/*
	 * task_pid_vnr() and relevant helpers were introduced at
	 * v2.6.23, while pid_namespace itself was introduced prior to
	 * that at v2.6.19.
	 *
	 * We've choosed here the former commit because implemented
	 * enough to provide pid facility was the period when the
	 * former patches were committed.
	 *
	 * We've chosen symbol ``pid_nr_ns'' because it is just a
	 * unique function that is not defined as static inline.
	 *
	 * See commit 7af5729474b5b8ad385adadab78d6e723e7655a3.
	 */
	if (symbol_exists("pid_nr_ns")) {
		ggt->task_pid = task_pid_vnr;
		ggt->task_pgrp = task_pgrp_vnr;
		ggt->task_session = task_session_vnr;
	} else {
		ggt->task_pid = task_pid;
		ggt->task_pgrp = process_group;
		ggt->task_session = task_session;
	}

	/*
	 * The way of tracking cputime changed when CFS was introduced
	 * at v2.6.23, which can be distinguished by checking whether
	 * se member of task_struct structure exist or not.
	 *
	 * See commit 20b8a59f2461e1be911dce2cfafefab9d22e4eee.
	 */
	if (GCORE_VALID_MEMBER(task_struct_se))
		ggt->thread_group_cputime = thread_group_cputime_v22;
	else
		ggt->thread_group_cputime = thread_group_cputime_v0;

        /*
	 * Credidentials feature was introduced at v2.6.28 where uid
	 * and gid members were moved into cred member of struct
	 * task_struct that was newly introduced.
	 *
         * See commit b6dff3ec5e116e3af6f537d4caedcad6b9e5082a.
	 */
	if (GCORE_VALID_MEMBER(task_struct_cred)) {
		ggt->task_uid = task_uid_v28;
		ggt->task_gid = task_gid_v28;
	} else {
		ggt->task_uid = task_uid_v0;
		ggt->task_gid = task_gid_v0;
	}

}

static unsigned int get_inode_i_nlink_v0(ulong file)
{
	ulong d_entry, d_inode;
	unsigned int i_nlink;

	readmem(file + OFFSET(file_f_dentry), KVADDR, &d_entry, sizeof(d_entry),
		"get_inode_i_nlink_v0: d_entry", gcore_verbose_error_handle());

	readmem(d_entry + OFFSET(dentry_d_inode), KVADDR, &d_inode,
		sizeof(d_inode), "get_inode_i_nlink_v0: d_inode",
		gcore_verbose_error_handle());

	readmem(d_inode + GCORE_OFFSET(inode_i_nlink), KVADDR, &i_nlink,
		sizeof(i_nlink), "get_inode_i_nlink_v0: i_nlink",
		gcore_verbose_error_handle());

	return i_nlink;
}

static unsigned int get_inode_i_nlink_v19(ulong file)
{
	ulong d_entry, d_inode;
	unsigned int i_nlink;

	readmem(file + OFFSET(file_f_path) + OFFSET(path_dentry), KVADDR,
		&d_entry, sizeof(d_entry), "get_inode_i_nlink_v19: d_entry",
		gcore_verbose_error_handle());

	readmem(d_entry + OFFSET(dentry_d_inode), KVADDR, &d_inode, sizeof(d_inode),
		"get_inode_i_nlink_v19: d_inode", gcore_verbose_error_handle());

	readmem(d_inode + GCORE_OFFSET(inode_i_nlink), KVADDR, &i_nlink,
		sizeof(i_nlink), "get_inode_i_nlink_v19: i_nlink",
		gcore_verbose_error_handle());

	return i_nlink;
}

static inline pid_t
task_pid(ulong task)
{
	return task_to_context(task)->pid;
}

static inline pid_t
process_group(ulong task)
{
	ulong signal;
	pid_t pgrp;

	readmem(task + OFFSET(task_struct_signal), KVADDR, &signal,
		sizeof(signal), "process_group: signal", gcore_verbose_error_handle());

	readmem(signal + GCORE_OFFSET(signal_struct_pgrp), KVADDR, &pgrp,
		sizeof(pgrp), "process_group: pgrp", gcore_verbose_error_handle());

	return pgrp;
}

static inline pid_t
task_session(ulong task)
{
	ulong signal;
	pid_t session;

	readmem(task + OFFSET(task_struct_signal), KVADDR, &signal,
		sizeof(signal), "process_group: signal", gcore_verbose_error_handle());

	readmem(signal + GCORE_OFFSET(signal_struct_session), KVADDR,
		&session, sizeof(session), "task_session: session",
		gcore_verbose_error_handle());

	return session;
}

static pid_t
pid_nr_ns(ulong pid, ulong ns)
{
	ulong upid;
	unsigned int ns_level, pid_level;
	pid_t nr = 0;

	readmem(ns + GCORE_OFFSET(pid_namespace_level), KVADDR, &ns_level,
		sizeof(ns_level), "pid_nr_ns: ns_level", gcore_verbose_error_handle());

	readmem(pid + GCORE_OFFSET(pid_level), KVADDR, &pid_level,
		sizeof(pid_level), "pid_nr_ns: pid_level", gcore_verbose_error_handle());

        if (pid && ns_level <= pid_level) {
		ulong upid_ns;

		upid = pid + OFFSET(pid_numbers) + SIZE(upid) * ns_level;

		readmem(upid + OFFSET(upid_ns), KVADDR, &upid_ns,
			sizeof(upid_ns), "pid_nr_ns: upid_ns",
			gcore_verbose_error_handle());

		if (upid_ns == ns)
			readmem(upid + OFFSET(upid_nr), KVADDR, &nr,
				sizeof(ulong), "pid_nr_ns: upid_nr",
				gcore_verbose_error_handle());
        }

        return nr;
}

static int
__task_pid_nr_ns(ulong task, enum pid_type type)
{
	ulong nsproxy, ns;
	int nr = 0;

	readmem(task + OFFSET(task_struct_nsproxy), KVADDR, &nsproxy,
		sizeof(nsproxy), "__task_pid_nr_ns: nsproxy",
		gcore_verbose_error_handle());

	readmem(nsproxy + GCORE_OFFSET(nsproxy_pid_ns), KVADDR, &ns,
		sizeof(ns), "__task_pid_nr_ns: ns", gcore_verbose_error_handle());

	if (pid_alive(task)) {
		ulong pids_type_pid;

                if (type != PIDTYPE_PID)
			readmem(task + MEMBER_OFFSET("task_struct",
						     "group_leader"),
				KVADDR, &task, sizeof(ulong),
				"__task_pid_nr_ns: group_leader",
				gcore_verbose_error_handle());

		readmem(task + OFFSET(task_struct_pids) + type * SIZE(pid_link)
			+ OFFSET(pid_link_pid), KVADDR, &pids_type_pid,
			sizeof(pids_type_pid),
			"__task_pid_nr_ns: pids_type_pid", gcore_verbose_error_handle());

		nr = pid_nr_ns(pids_type_pid, ns);
        }

        return nr;
}

static inline pid_t
task_pid_vnr(ulong task)
{
	return __task_pid_nr_ns(task, PIDTYPE_PID);
}

static inline pid_t
task_pgrp_vnr(ulong task)
{
        return __task_pid_nr_ns(task, PIDTYPE_PGID);
}

static inline pid_t
task_session_vnr(ulong task)
{
        return __task_pid_nr_ns(task, PIDTYPE_SID);
}

static void
thread_group_cputime_v0(ulong task, const struct thread_group_list *threads,
			struct task_cputime *cputime)
{
	ulong signal;
	ulong utime, signal_utime, stime, signal_stime;

	readmem(task + OFFSET(task_struct_signal), KVADDR, &signal,
		sizeof(signal), "thread_group_cputime_v0: signal",
		gcore_verbose_error_handle());

	readmem(task + OFFSET(task_struct_utime), KVADDR, &utime,
		sizeof(utime), "thread_group_cputime_v0: utime",
		gcore_verbose_error_handle());

	readmem(signal + GCORE_OFFSET(signal_struct_utime), KVADDR,
		&signal_utime, sizeof(signal_utime),
		"thread_group_cputime_v0: signal_utime",
		gcore_verbose_error_handle());

	readmem(task + OFFSET(task_struct_stime), KVADDR, &stime,
		sizeof(stime), "thread_group_cputime_v0: stime",
		gcore_verbose_error_handle());

	readmem(signal + GCORE_OFFSET(signal_struct_stime), KVADDR,
		&signal_stime, sizeof(signal_stime),
		"thread_group_cputime_v0: signal_stime",
		gcore_verbose_error_handle());

	cputime->utime = utime + signal_utime;
	cputime->stime = stime + signal_stime;
	cputime->sum_exec_runtime = 0;

}

static void
thread_group_cputime_v22(ulong task, const struct thread_group_list *threads,
			 struct task_cputime *times)
{
	const struct thread_group_list *t;
	ulong sighand, signal, signal_utime, signal_stime;
	uint64_t sum_sched_runtime;

	*times = INIT_CPUTIME;

	readmem(task + OFFSET(task_struct_sighand), KVADDR, &sighand,
		sizeof(sighand), "thread_group_cputime_v22: sighand",
		gcore_verbose_error_handle());

	if (!sighand)
		goto out;

	readmem(task + OFFSET(task_struct_signal), KVADDR, &signal,
		sizeof(signal), "thread_group_cputime_v22: signal",
		gcore_verbose_error_handle());

	for (t = threads; t; t = t->next) {
		ulong utime, stime;
		uint64_t sum_exec_runtime;

		readmem(t->task + OFFSET(task_struct_utime), KVADDR, &utime,
			sizeof(utime), "thread_group_cputime_v22: utime",
			gcore_verbose_error_handle());

		readmem(t->task + OFFSET(task_struct_stime), KVADDR, &stime,
			sizeof(stime), "thread_group_cputime_v22: stime",
			gcore_verbose_error_handle());

		readmem(t->task + GCORE_OFFSET(task_struct_se) +
			GCORE_OFFSET(sched_entity_sum_exec_runtime), KVADDR,
			&sum_exec_runtime, sizeof(sum_exec_runtime),
			"thread_group_cputime_v22: sum_exec_runtime",
			gcore_verbose_error_handle());

		times->utime = cputime_add(times->utime, utime);
		times->stime = cputime_add(times->stime, stime);
		times->sum_exec_runtime += sum_exec_runtime;
	}

	readmem(signal + GCORE_OFFSET(signal_struct_utime), KVADDR,
		&signal_utime, sizeof(signal_utime),
		"thread_group_cputime_v22: signal_utime", gcore_verbose_error_handle());

	readmem(signal + GCORE_OFFSET(signal_struct_stime), KVADDR,
		&signal_stime, sizeof(signal_stime),
		"thread_group_cputime_v22: signal_stime", gcore_verbose_error_handle());

	readmem(signal + GCORE_OFFSET(signal_struct_sum_sched_runtime),
		KVADDR, &sum_sched_runtime, sizeof(sum_sched_runtime),
		"thread_group_cputime_v22: sum_sched_runtime",
		gcore_verbose_error_handle());

	times->utime = cputime_add(times->utime, signal_utime);
	times->stime = cputime_add(times->stime, signal_stime);
	times->sum_exec_runtime += sum_sched_runtime;

out:
	return;
}

static inline __kernel_uid_t
task_uid_v0(ulong task)
{
	__kernel_uid_t uid;

	readmem(task + GCORE_OFFSET(task_struct_uid), KVADDR, &uid,
		sizeof(uid), "task_uid_v0: uid", gcore_verbose_error_handle());

	return uid;
}

static inline __kernel_uid_t
task_uid_v28(ulong task)
{
	ulong cred;
	__kernel_uid_t uid;

	readmem(task + GCORE_OFFSET(task_struct_real_cred), KVADDR, &cred,
		sizeof(cred), "task_uid_v28: real_cred", gcore_verbose_error_handle());

	readmem(cred + GCORE_OFFSET(cred_uid), KVADDR, &uid, sizeof(uid),
		"task_uid_v28: uid", gcore_verbose_error_handle());

	return uid;
}

static inline __kernel_gid_t
task_gid_v0(ulong task)
{
	__kernel_gid_t gid;

	readmem(task + GCORE_OFFSET(task_struct_gid), KVADDR, &gid,
		sizeof(gid), "task_gid_v0: gid", gcore_verbose_error_handle());

	return gid;
}

static inline __kernel_gid_t
task_gid_v28(ulong task)
{
	ulong cred;
	__kernel_gid_t gid;

	readmem(task + GCORE_OFFSET(task_struct_real_cred), KVADDR, &cred,
		sizeof(cred), "task_gid_v28: real_cred", gcore_verbose_error_handle());

	readmem(cred + GCORE_OFFSET(cred_gid), KVADDR, &gid, sizeof(gid),
		"task_gid_v28: gid", gcore_verbose_error_handle());

	return gid;
}

static int
pid_alive(ulong task)
{
	pid_t pid;

	readmem(task + OFFSET(task_struct_pids) + PIDTYPE_PID * SIZE(pid_link)
		+ OFFSET(pid_link_pid), KVADDR, &pid, sizeof(pid), "pid_alive",
		gcore_verbose_error_handle());

        return !!pid;
}

#ifdef GCORE_TEST

char *gcore_coredump_table_test(void)
{
	int test_i_nlink, test_pid, test_pgrp, test_session, test_cputime, test_uid, test_gid;

	if (gcore_is_rhel4()) {
		test_i_nlink = ggt->get_inode_i_nlink == get_inode_i_nlink_v0;
		test_pid = ggt->task_pid == task_pid;
		test_pgrp = ggt->task_pgrp == process_group;
		test_session = ggt->task_session == task_session;
		test_cputime = ggt->thread_group_cputime == thread_group_cputime_v0;
		test_uid = ggt->task_uid == task_uid_v0;
		test_gid = ggt->task_gid == task_gid_v0;
	} else if (gcore_is_rhel5()) {
		test_i_nlink = ggt->get_inode_i_nlink == get_inode_i_nlink_v0;
		test_pid = ggt->task_pid == task_pid;
		test_pgrp = ggt->task_pgrp == process_group;
		test_session = ggt->task_session == task_session;
		test_cputime = ggt->thread_group_cputime == thread_group_cputime_v0;
		test_uid = ggt->task_uid == task_uid_v0;
		test_gid = ggt->task_gid == task_gid_v0;
	} else if (gcore_is_rhel6()) {
		test_i_nlink = ggt->get_inode_i_nlink == get_inode_i_nlink_v19;
		test_pid = ggt->task_pid == task_pid_vnr;
		test_pgrp = ggt->task_pgrp == task_pgrp_vnr;
		test_session = ggt->task_session == task_session_vnr;
		test_cputime = ggt->thread_group_cputime == thread_group_cputime_v22;
		test_uid = ggt->task_uid == task_uid_v28;
		test_gid = ggt->task_gid == task_gid_v28;
	} else if (THIS_KERNEL_VERSION == LINUX(2,6,36)) {
		test_i_nlink = ggt->get_inode_i_nlink == get_inode_i_nlink_v19;
		test_pid = ggt->task_pid == task_pid_vnr;
		test_pgrp = ggt->task_pgrp == task_pgrp_vnr;
		test_session = ggt->task_session == task_session_vnr;
		test_cputime = ggt->thread_group_cputime == thread_group_cputime_v22;
		test_uid = ggt->task_uid == task_uid_v28;
		test_gid = ggt->task_gid == task_gid_v28;
	}

	mu_assert("ggt->get_inode_i_nlink has wrongly been registered", test_i_nlink);
	mu_assert("ggt->task_pid has wrongly been registered", test_pid);
	mu_assert("ggt->task_pgrp has wrongly been registered", test_pgrp);
	mu_assert("ggt->task_session has wrongly been registered", test_session);
	mu_assert("ggt->thread_group_cputime has wrongly been registered", test_cputime);
	mu_assert("ggt->task_uid has wrongly been registered", test_uid);
	mu_assert("ggt->task_gid has wrongly been registered", test_gid);

	return NULL;
}

#endif /* GCORE_TEST */
-------------- next part --------------
/* gcore_defs.h -- core analysis suite
 *
 * Copyright (C) 2010 FUJITSU LIMITED
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */
#ifndef GCORE_DEFS_H_
#define GCORE_DEFS_H_

#define PN_XNUM 0xffff

#define ELF_CORE_EFLAGS 0

#ifdef X86_64
#define ELF_EXEC_PAGESIZE 4096

#define ELF_MACHINE EM_X86_64
#define ELF_OSABI ELFOSABI_NONE

#define ELF_CLASS ELFCLASS64
#define ELF_DATA ELFDATA2LSB
#define ELF_ARCH EM_X86_64

#define Elf_Half Elf64_Half
#define Elf_Word Elf64_Word
#define Elf_Off Elf64_Off

#define Elf_Ehdr Elf64_Ehdr
#define Elf_Phdr Elf64_Phdr
#define Elf_Shdr Elf64_Shdr
#define Elf_Nhdr Elf64_Nhdr
#elif X86
#define ELF_EXEC_PAGESIZE 4096

#define ELF_MACHINE EM_386
#define ELF_OSABI ELFOSABI_NONE

#define ELF_CLASS ELFCLASS32
#define ELF_DATA ELFDATA2LSB
#define ELF_ARCH EM_386

#define Elf_Half Elf32_Half
#define Elf_Word Elf32_Word
#define Elf_Off Elf32_Off

#define Elf_Ehdr Elf32_Ehdr
#define Elf_Phdr Elf32_Phdr
#define Elf_Shdr Elf32_Shdr
#define Elf_Nhdr Elf32_Nhdr
#endif

/*
 * gcore_regset.c
 *
 * The regset interface is fully borrowed from the library with the
 * same name in kernel used in the implementation of collecting note
 * information. See include/regset.h in detail.
 */
struct user_regset;
struct task_context;
struct elf_thread_core_info;

/**
 * user_regset_active_fn - type of @active function in &struct user_regset
 * @target:	thread being examined
 * @regset:	task context being examined
 *
 * Return TRUE if there is an interesting resource.
 * Return FALSE otherwise.
 */
typedef int user_regset_active_fn(struct task_context *target,
				  const struct user_regset *regset);

/**
 * user_regset_get_fn - type of @get function in &struct user_regset
 * @target:	task context being examined
 * @regset:	regset being examined
 * @size:	amount of data to copy, in bytes
 * @buf:	if a user-space pointer to copy into
 *
 * Fetch register values. Return TRUE on success and FALSE otherwise.
 * The @size is in bytes.
 */
typedef int user_regset_get_fn(struct task_context *target,
			       const struct user_regset *regset,
			       unsigned int size,
			       void *buf);

/**
 * user_regset_writeback_fn - type of @writeback function in &struct user_regset
 * @target:	thread being examined
 * @regset:	regset being examined
 * @immediate:	zero if writeback at completion of next context switch is OK
 *
 * This call is optional; usually the pointer is %NULL.
 *
 * Return TRUE on success or FALSE otherwise.
 */
typedef int user_regset_writeback_fn(struct task_context *target,
				     const struct user_regset *regset,
				     int immediate);

/**
 * user_regset_callback_fn - type of @callback function in &struct user_regset
 * @t:          thread core information being gathered
 * @regset:	regset being examined
 *
 * Edit another piece of information contained in @t in terms of @regset.
 * This call is optional; the pointer is %NULL if there is no requirement to
 * edit.
 */
typedef void user_regset_callback_fn(struct elf_thread_core_info *t,
				     const struct user_regset *regset);

/**
 * struct user_regset - accessible thread CPU state
 * @size:		Size in bytes of a slot (register).
 * @core_note_type:	ELF note @n_type value used in core dumps.
 * @get:		Function to fetch values.
 * @active:		Function to report if regset is active, or %NULL.
 *
 * @name:               Note section name.
 * @callback:           Function to edit thread core information, or %NULL.
 *
 * This data structure describes machine resource to be retrieved as
 * process core dump. Each member of this structure characterizes the
 * resource and the operations necessary in core dump process.
 *
 * @get provides a means of retrieving the corresponding resource;
 * @active provides a means of checking if the resource exists;
 * @writeback performs some architecture-specific operation to make it
 * reflect the current actual state; @size means a size of the machine
 * resource in bytes; @core_note_type is a type of note information;
 * @name is a note section name representing the owner originator that
 * handles this kind of the machine resource; @callback is an extra
 * operation to edit another note information of the same thread,
 * required when the machine resource is collected.
 */
struct user_regset {
	user_regset_get_fn		*get;
	user_regset_active_fn		*active;
	user_regset_writeback_fn	*writeback;
	unsigned int 			size;
	unsigned int 			core_note_type;
	char                            *name;
	user_regset_callback_fn         *callback;
};

/**
 * struct user_regset_view - available regsets
 * @name:	Identifier, e.g. UTS_MACHINE string.
 * @regsets:	Array of @n regsets available in this view.
 * @n:		Number of elements in @regsets.
 * @e_machine:	ELF header @e_machine %EM_* value written in core dumps.
 * @e_flags:	ELF header @e_flags value written in core dumps.
 * @ei_osabi:	ELF header @e_ident[%EI_OSABI] value written in core dumps.
 *
 * A regset view is a collection of regsets (&struct user_regset,
 * above).  This describes all the state of a thread that are
 * collected as note information of process core dump.
 */
struct user_regset_view {
	const char *name;
	const struct user_regset *regsets;
	unsigned int n;
	uint32_t e_flags;
	uint16_t e_machine;
	uint8_t ei_osabi;
};

/**
 * task_user_regset_view - Return the process's regset view.
 *
 * Return the &struct user_regset_view. By default, it returns
 * &gcore_default_regset_view.
 *
 * This is defined as a weak symbol. If there's another
 * task_user_regset_view at linking time, it is used instead, useful
 * to support different kernel version or architecture.
 */
extern const struct user_regset_view *task_user_regset_view(void);
extern void gcore_default_regsets_init(void);

#if X86
#define REGSET_VIEW_NAME "i386"
#define REGSET_VIEW_MACHINE EM_386
#elif X86_64
#define REGSET_VIEW_NAME "x86_64"
#define REGSET_VIEW_MACHINE EM_X86_64
#elif IA64
#define REGSET_VIEW_NAME "ia64"
#define REGSET_VIEW_MACHINE EM_IA_64
#endif

/*
 * gcore_dumpfilter.c
 */
extern int gcore_dumpfilter_set(ulong filter);
extern void gcore_dumpfilter_set_default(void);
extern ulong gcore_dumpfilter_vma_dump_size(ulong vma);

/*
 * gcore_verbose.c
 */
#define VERBOSE_PROGRESS  0x1
#define VERBOSE_NONQUIET  0x2
#define VERBOSE_PAGEFAULT 0x4
#define VERBOSE_DEFAULT_LEVEL VERBOSE_PAGEFAULT
#define VERBOSE_MAX_LEVEL (VERBOSE_PROGRESS + VERBOSE_NONQUIET + \
			   VERBOSE_PAGEFAULT)

#define VERBOSE_DEFAULT_ERROR_HANDLE (FAULT_ON_ERROR | QUIET)

/*
 * Verbose flag is set each time gcore is executed. The same verbose
 * flag value is used for all the tasks given together in the command
 * line.
 */
extern void gcore_verbose_set_default(void);

/**
 * gcore_verbose_set() - set verbose level
 *
 * @level verbose level intended to be assigend: might be minus and
 *        larger than VERBOSE_DEFAULT_LEVEL.
 *
 * If @level is a minus value or strictly larger than VERBOSE_MAX_LEVEL,
 * return FALSE. Otherwise, update a global date, gvd, to @level, and returns
 * TRUE.
 */
extern int gcore_verbose_set(ulong level);

/**
 * gcore_verbose_get() - get verbose level
 *
 * Return the current verbose level contained in the global data.
 */
extern ulong gcore_verbose_get(void);

/**
 * gcore_verbose_error_handle() - get error handle
 *
 * Return the current error_handle contained in the global data.
 */
extern ulong gcore_verbose_error_handle(void);

/*
 * Helper printing functions for respective verbose flags
 */

/**
 * verbosef() - print verbose information if flag is set currently.
 *
 * @flag   verbose flag that is currently concerned about.
 * @format printf style format that is printed into standard output.
 *
 * Always returns FALSE.
 */
#define verbosef(vflag, eflag, ...)					\
	({								\
		if (gcore_verbose_get() & (vflag)) {			\
			(void) error((eflag), __VA_ARGS__);		\
		}							\
		FALSE;							\
	})

/**
 * progressf() - print progress verbose information
 *
 * @format printf style format that is printed into standard output.
 *
 * Print progress verbose informaiton if VERBOSE_PROGRESS is set currently.
 */
#define progressf(...) verbosef(VERBOSE_PROGRESS, INFO, __VA_ARGS__)

/**
 * pagefaultf() - print page fault verbose information
 *
 * @format printf style format that is printed into standard output.
 *
 * print pagefault verbose informaiton if VERBOSE_PAGEFAULT is set currently.
 */
#define pagefaultf(...) verbosef(VERBOSE_PAGEFAULT, WARNING, __VA_ARGS__)

/*
 * gcore_x86.c
 */
extern struct gcore_x86_table *gxt;

extern void gcore_x86_table_init(void);

#ifdef X86_64
struct user_regs_struct {
	unsigned long	r15;
	unsigned long	r14;
	unsigned long	r13;
	unsigned long	r12;
	unsigned long	bp;
	unsigned long	bx;
	unsigned long	r11;
	unsigned long	r10;
	unsigned long	r9;
	unsigned long	r8;
	unsigned long	ax;
	unsigned long	cx;
	unsigned long	dx;
	unsigned long	si;
	unsigned long	di;
	unsigned long	orig_ax;
	unsigned long	ip;
	unsigned long	cs;
	unsigned long	flags;
	unsigned long	sp;
	unsigned long	ss;
	unsigned long	fs_base;
	unsigned long	gs_base;
	unsigned long	ds;
	unsigned long	es;
	unsigned long	fs;
	unsigned long	gs;
};
#endif

#ifdef X86
struct user_regs_struct {
	unsigned long	bx;
	unsigned long	cx;
	unsigned long	dx;
	unsigned long	si;
	unsigned long	di;
	unsigned long	bp;
	unsigned long	ax;
	unsigned long	ds;
	unsigned long	es;
	unsigned long	fs;
	unsigned long	gs;
	unsigned long	orig_ax;
	unsigned long	ip;
	unsigned long	cs;
	unsigned long	flags;
	unsigned long	sp;
	unsigned long	ss;
};
#endif

typedef ulong elf_greg_t;
#define ELF_NGREG (sizeof(struct user_regs_struct) / sizeof(elf_greg_t))
typedef elf_greg_t elf_gregset_t[ELF_NGREG];

#ifdef X86
#define PAGE_SIZE 4096
#endif

/*
 * gcore_coredump_table.c
 */
extern void gcore_coredump_table_init(void);

/*
 * gcore_coredump.c
 */
extern void gcore_coredump(void);

/*
 * gcore_global_data.c
 */
extern struct gcore_data *gcore;
extern struct gcore_coredump_table *ggt;
extern struct gcore_offset_table gcore_offset_table;
extern struct gcore_size_table gcore_size_table;

/*
 * Misc
 */
enum pid_type
{
        PIDTYPE_PID,
        PIDTYPE_PGID,
        PIDTYPE_SID,
        PIDTYPE_MAX
};

struct elf_siginfo
{
        int     si_signo;                       /* signal number */
	int     si_code;                        /* extra code */
        int     si_errno;                       /* errno */
};

/* Parameters used to convert the timespec values: */
#define NSEC_PER_USEC   1000L
#define NSEC_PER_SEC    1000000000L

/* The clock frequency of the i8253/i8254 PIT */
#define PIT_TICK_RATE 1193182ul

/* Assume we use the PIT time source for the clock tick */
#define CLOCK_TICK_RATE         PIT_TICK_RATE

/* LATCH is used in the interval timer and ftape setup. */
#define LATCH  ((CLOCK_TICK_RATE + HZ/2) / HZ)  /* For divider */

/* Suppose we want to devide two numbers NOM and DEN: NOM/DEN, then we can
 * improve accuracy by shifting LSH bits, hence calculating:
 *     (NOM << LSH) / DEN
 * This however means trouble for large NOM, because (NOM << LSH) may no
 * longer fit in 32 bits. The following way of calculating this gives us
 * some slack, under the following conditions:
 *   - (NOM / DEN) fits in (32 - LSH) bits.
 *   - (NOM % DEN) fits in (32 - LSH) bits.
 */
#define SH_DIV(NOM,DEN,LSH) (   (((NOM) / (DEN)) << (LSH))              \
				+ ((((NOM) % (DEN)) << (LSH)) + (DEN) / 2) / (DEN))

/* HZ is the requested value. ACTHZ is actual HZ ("<< 8" is for accuracy) */
#define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LATCH, 8))

/* TICK_NSEC is the time between ticks in nsec assuming real ACTHZ */
#define TICK_NSEC (SH_DIV (1000000UL * 1000, ACTHZ, 8))

#define cputime_add(__a, __b)           ((__a) +  (__b))
#define cputime_sub(__a, __b)           ((__a) -  (__b))

typedef unsigned long cputime_t;

#define cputime_zero                    (0UL)

struct task_cputime {
        cputime_t utime;
        cputime_t stime;
        unsigned long long sum_exec_runtime;
};

#define INIT_CPUTIME						\
        (struct task_cputime) {                                 \
                .utime = cputime_zero,                          \
			.stime = cputime_zero,                          \
			.sum_exec_runtime = 0,                          \
			}

static inline uint64_t div_u64_rem(uint64_t dividend, uint32_t divisor,
				   uint32_t *remainder)
{
        *remainder = dividend % divisor;
        return dividend / divisor;
}

static inline void
jiffies_to_timeval(const unsigned long jiffies, struct timeval *value)
{
        /*
         * Convert jiffies to nanoseconds and separate with
         * one divide.
         */
        uint32_t rem;

        value->tv_sec = div_u64_rem((uint64_t)jiffies * TICK_NSEC,
                                    NSEC_PER_SEC, &rem);
        value->tv_usec = rem / NSEC_PER_USEC;
}

#define cputime_to_timeval(__ct,__val)  jiffies_to_timeval(__ct,__val)

struct elf_prstatus
{
	struct elf_siginfo pr_info;	/* Info associated with signal */
	short	pr_cursig;		/* Current signal */
	unsigned long pr_sigpend;	/* Set of pending signals */
	unsigned long pr_sighold;	/* Set of held signals */
	int	pr_pid;
	int	pr_ppid;
	int	pr_pgrp;
	int	pr_sid;
	struct timeval pr_utime;	/* User time */
	struct timeval pr_stime;	/* System time */
	struct timeval pr_cutime;	/* Cumulative user time */
	struct timeval pr_cstime;	/* Cumulative system time */
	elf_gregset_t pr_reg;	/* GP registers */
	int pr_fpvalid;		/* True if math co-processor being used.  */
};

typedef unsigned short __kernel_old_uid_t;
typedef unsigned short __kernel_old_gid_t;

typedef __kernel_old_uid_t      old_uid_t;
typedef __kernel_old_gid_t      old_gid_t;

#ifdef X86_64
typedef unsigned int __kernel_uid_t;
typedef unsigned int __kernel_gid_t;
#elif X86
typedef unsigned short __kernel_uid_t;
typedef unsigned short __kernel_gid_t;
#endif

#define overflowuid (symbol_exists("overflowuid"))
#define overflowgid (symbol_exists("overflowgid"))

#define high2lowuid(uid) ((uid) & ~0xFFFF ? (old_uid_t)overflowuid : (old_uid_t)(uid))
#define high2lowgid(gid) ((gid) & ~0xFFFF ? (old_gid_t)overflowgid : (old_gid_t)(gid))

#define __convert_uid(size, uid) \
        (size >= sizeof(uid) ? (uid) : high2lowuid(uid))
#define __convert_gid(size, gid) \
        (size >= sizeof(gid) ? (gid) : high2lowgid(gid))

#define SET_UID(var, uid) do { (var) = __convert_uid(sizeof(var), (uid)); } while (0)
#define SET_GID(var, gid) do { (var) = __convert_gid(sizeof(var), (gid)); } while (0)

#define MAX_USER_RT_PRIO        100
#define MAX_RT_PRIO             MAX_USER_RT_PRIO

#define PRIO_TO_NICE(prio)      ((prio) - MAX_RT_PRIO - 20)
#define TASK_NICE(p)            PRIO_TO_NICE((p)->static_prio)

static inline ulong ffz(ulong word)
{
        int num = 0;

#if defined(X86_64) || defined(IA64)
        if ((word & 0xffffffff) == 0) {
                num += 32;
                word >>= 32;
        }
#endif
        if ((word & 0xffff) == 0) {
                num += 16;
                word >>= 16;
        }
        if ((word & 0xff) == 0) {
                num += 8;
                word >>= 8;
        }
        if ((word & 0xf) == 0) {
                num += 4;
                word >>= 4;
        }
        if ((word & 0x3) == 0) {
                num += 2;
                word >>= 2;
        }
        if ((word & 0x1) == 0)
                num += 1;
        return num;
}

#define ELF_PRARGSZ     (80)    /* Number of chars for args */

struct elf_prpsinfo
{
        char    pr_state;       /* numeric process state */
        char    pr_sname;       /* char for pr_state */
        char    pr_zomb;        /* zombie */
        char    pr_nice;        /* nice val */
        unsigned long pr_flag;  /* flags */
        __kernel_uid_t  pr_uid;
        __kernel_gid_t  pr_gid;
        pid_t   pr_pid, pr_ppid, pr_pgrp, pr_sid;
        /* Lots missing */
        char    pr_fname[16];   /* filename of executable */
        char    pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
};

#define TASK_COMM_LEN 16

#define	CORENAME_MAX_SIZE 128

struct memelfnote
{
	const char *name;
	int type;
	unsigned int datasz;
	void *data;
};

struct thread_group_list {
	struct thread_group_list *next;
	ulong task;
};

struct elf_thread_core_info {
	struct elf_thread_core_info *next;
	ulong task;
	struct elf_prstatus prstatus;
	struct memelfnote notes[0];
};

struct elf_note_info {
	struct elf_thread_core_info *thread;
	struct memelfnote psinfo;
	struct memelfnote auxv;
	size_t size;
	int thread_notes;
};

/*
 * vm_flags in vm_area_struct, see mm_types.h.
 */
#define VM_READ		0x00000001	/* currently active flags */
#define VM_WRITE	0x00000002
#define VM_EXEC		0x00000004
#define VM_SHARED	0x00000008
#define VM_IO           0x00004000      /* Memory mapped I/O or similar */
#define VM_RESERVED     0x00080000      /* Count as reserved_vm like IO */
#define VM_HUGETLB      0x00400000      /* Huge TLB Page VM */
#define VM_ALWAYSDUMP   0x04000000      /* Always include in core dumps */

#define FOR_EACH_VMA_OBJECT(vma, index, mmap)		\
	for (index = 0, vma = mmap; vma; ++index, vma = next_vma(vma))

extern int _init(void);
extern int _fini(void);
extern char *help_gcore[];
extern void cmd_gcore(void);

struct gcore_coredump_table {

	unsigned int (*get_inode_i_nlink)(ulong file);

	pid_t (*task_pid)(ulong task);
	pid_t (*task_pgrp)(ulong task);
	pid_t (*task_session)(ulong task);

	void (*thread_group_cputime)(ulong task,
				     const struct thread_group_list *threads,
				     struct task_cputime *cputime);

	__kernel_uid_t (*task_uid)(ulong task);
	__kernel_gid_t (*task_gid)(ulong task);
};

struct gcore_offset_table
{
	long cpuinfo_x86_hard_math;
	long cpuinfo_x86_x86_capability;
	long cred_gid;
	long cred_uid;
	long desc_struct_base0;
	long desc_struct_base1;
	long desc_struct_base2;
	long fpu_state;
	long inode_i_nlink;
	long nsproxy_pid_ns;
	long mm_struct_arg_start;
	long mm_struct_arg_end;
	long mm_struct_map_count;
	long mm_struct_saved_auxv;
	long pid_level;
	long pid_namespace_level;
	long pt_regs_ax;
	long pt_regs_bp;
	long pt_regs_bx;
	long pt_regs_cs;
	long pt_regs_cx;
	long pt_regs_di;
	long pt_regs_ds;
	long pt_regs_dx;
	long pt_regs_es;
	long pt_regs_flags;
	long pt_regs_fs;
	long pt_regs_gs;
	long pt_regs_ip;
	long pt_regs_orig_ax;
	long pt_regs_si;
	long pt_regs_sp;
	long pt_regs_ss;
	long pt_regs_xfs;
	long pt_regs_xgs;
	long sched_entity_sum_exec_runtime;
	long signal_struct_cutime;
	long signal_struct_pgrp;
	long signal_struct_session;
	long signal_struct_stime;
	long signal_struct_sum_sched_runtime;
	long signal_struct_utime;
	long task_struct_cred;
	long task_struct_gid;
	long task_struct_group_leader;
	long task_struct_real_cred;
	long task_struct_real_parent;
	long task_struct_se;
	long task_struct_static_prio;
	long task_struct_uid;
	long task_struct_used_math;
	long thread_info_status;
	long thread_struct_ds;
	long thread_struct_es;
	long thread_struct_fs;
	long thread_struct_fsindex;
	long thread_struct_fpu;
	long thread_struct_gs;
	long thread_struct_gsindex;
	long thread_struct_i387;
	long thread_struct_tls_array;
	long thread_struct_usersp;
	long thread_struct_xstate;
	long thread_struct_io_bitmap_max;
	long thread_struct_io_bitmap_ptr;
	long user_regset_n;
	long vm_area_struct_anon_vma;
	long x8664_pda_oldrsp;
};

struct gcore_size_table
{
	long mm_struct_saved_auxv;
	long thread_struct_fs;
	long thread_struct_fsindex;
	long thread_struct_gs;
	long thread_struct_gsindex;
	long thread_struct_tls_array;
	long vm_area_struct_anon_vma;
	long thread_xstate;
	long i387_union;
};

#define GCORE_OFFSET(X) (OFFSET_verify(gcore_offset_table.X, (char *)__FUNCTION__, __FILE__, __LINE__, #X))
#define GCORE_SIZE(X) (SIZE_verify(gcore_size_table.X, (char *)__FUNCTION__, __FILE__, __LINE__, #X))
#define GCORE_VALID_MEMBER(X) (gcore_offset_table.X >= 0)
#define GCORE_ASSIGN_OFFSET(X) (gcore_offset_table.X)
#define GCORE_MEMBER_OFFSET_INIT(X, Y, Z) (GCORE_ASSIGN_OFFSET(X) = MEMBER_OFFSET(Y, Z))
#define GCORE_ASSIGN_SIZE(X) (gcore_size_table.X)
#define GCORE_SIZE_INIT(X, Y, Z) (GCORE_ASSIGN_SIZE(X) = MEMBER_SIZE(Y, Z))
#define GCORE_MEMBER_SIZE_INIT(X, Y, Z) (GCORE_ASSIGN_SIZE(X) = MEMBER_SIZE(Y, Z))
#define GCORE_STRUCT_SIZE_INIT(X, Y) (GCORE_ASSIGN_SIZE(X) = STRUCT_SIZE(Y))

extern struct gcore_offset_table gcore_offset_table;
extern struct gcore_size_table gcore_size_table;

/*
 * gcore flags
 */
#define GCF_SUCCESS     0x1
#define GCF_UNDER_COREDUMP 0x2

struct gcore_data
{
	ulong flags;
	int fd;
	struct task_context *orig;
	char corename[CORENAME_MAX_SIZE + 1];
};

static inline void gcore_arch_table_init(void)
{
#if defined (X86_64) || defined (X86)
	gcore_x86_table_init();
#endif
}

static inline void gcore_arch_regsets_init(void)
{
#if X86_64
	extern void gcore_x86_64_regsets_init(void);
	gcore_x86_64_regsets_init();
#elif X86
	extern void gcore_x86_32_regsets_init(void);
	gcore_x86_32_regsets_init();
#else
	extern void gcore_default_regsets_init(void);
	gcore_default_regsets_init();
#endif
}

#ifdef GCORE_TEST

static inline int gcore_proc_version_contains(const char *s)
{
	return strstr(kt->proc_version, s) ? TRUE : FALSE;
}

static inline int gcore_is_rhel4(void)
{
	return THIS_KERNEL_VERSION == LINUX(2,6,9)
		&& gcore_proc_version_contains(".EL");
}

static inline int gcore_is_rhel5(void)
{
	return THIS_KERNEL_VERSION == LINUX(2,6,18)
		&& gcore_proc_version_contains(".el5");
}

static inline int gcore_is_rhel6(void)
{
	return THIS_KERNEL_VERSION == LINUX(2,6,32)
		&& gcore_proc_version_contains(".el6");
}

extern char *help_gcore_test[];
extern void cmd_gcore_test(void);

#define mu_assert(message, test) do { if (!(test)) return message; } while (0)
#define mu_run_test(test) do { char *message = test(); tests_run++; \
		????if (message) return message; } while (0)
extern int tests_run;

extern char *gcore_x86_test(void);
extern char *gcore_coredump_table_test(void);
extern char *gcore_dumpfilter_test(void);

#endif

#endif /* GCORE_DEFS_H_ */
-------------- next part --------------
/* gcore_dumpfilter.c -- core analysis suite
 *
 * Copyright (C) 2010 FUJITSU LIMITED
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */
#include <defs.h>
#include <gcore_defs.h>
#include <elf.h>

#define GCORE_DUMPFILTER_ANON_PRIVATE    (0x1)
#define GCORE_DUMPFILTER_ANON_SHARED     (0x2)
#define GCORE_DUMPFILTER_MAPPED_PRIVATE  (0x4)
#define GCORE_DUMPFILTER_MAPPED_SHARED   (0x8)
#define GCORE_DUMPFILTER_ELF_HEADERS     (0x10)
#define GCORE_DUMPFILTER_HUGETLB_PRIVATE (0x20)
#define GCORE_DUMPFILTER_HUGETLB_SHARED  (0x40)
#define GCORE_DUMPFILTER_MAX_LEVEL (GCORE_DUMPFILTER_ANON_PRIVATE	\
				    |GCORE_DUMPFILTER_ANON_SHARED	\
				    |GCORE_DUMPFILTER_MAPPED_PRIVATE	\
				    |GCORE_DUMPFILTER_MAPPED_SHARED	\
				    |GCORE_DUMPFILTER_ELF_HEADERS	\
				    |GCORE_DUMPFILTER_HUGETLB_PRIVATE	\
				    |GCORE_DUMPFILTER_HUGETLB_SHARED)

#define GCORE_DUMPFILTER_DEFAULT (GCORE_DUMPFILTER_ANON_PRIVATE		\
				  | GCORE_DUMPFILTER_ANON_SHARED	\
				  | GCORE_DUMPFILTER_HUGETLB_PRIVATE)

static ulong dumpfilter = GCORE_DUMPFILTER_DEFAULT;

int gcore_dumpfilter_set(ulong filter)
{
	if (filter > GCORE_DUMPFILTER_MAX_LEVEL)
		return 0;

	dumpfilter = filter;

	return TRUE;
}

void gcore_dumpfilter_set_default(void)
{
	dumpfilter = GCORE_DUMPFILTER_DEFAULT;
}

static inline int is_filtered(int bit)
{
	return !!(dumpfilter & bit);
}

ulong gcore_dumpfilter_vma_dump_size(ulong vma)
{
	char *vma_cache;
	physaddr_t paddr;
	ulong vm_start, vm_end, vm_flags, vm_file, vm_pgoff, anon_vma;

	vma_cache = fill_vma_cache(vma);
	vm_start = ULONG(vma_cache + OFFSET(vm_area_struct_vm_start));
	vm_end = ULONG(vma_cache + OFFSET(vm_area_struct_vm_end));
	vm_flags = ULONG(vma_cache + OFFSET(vm_area_struct_vm_flags));
	vm_file = ULONG(vma_cache + OFFSET(vm_area_struct_vm_file));
	vm_pgoff = ULONG(vma_cache + OFFSET(vm_area_struct_vm_pgoff));
	anon_vma = ULONG(vma_cache + GCORE_OFFSET(vm_area_struct_anon_vma));

        /* The vma can be set up to tell us the answer directly.  */
        if (vm_flags & VM_ALWAYSDUMP)
                goto whole;

        /* Hugetlb memory check */
	if (vm_flags & VM_HUGETLB)
		if ((vm_flags & VM_SHARED)
		    ? is_filtered(GCORE_DUMPFILTER_HUGETLB_SHARED)
		    : is_filtered(GCORE_DUMPFILTER_HUGETLB_PRIVATE))
			goto whole;

        /* Do not dump I/O mapped devices or special mappings */
        if (vm_flags & (VM_IO | VM_RESERVED))
		goto nothing;

        /* By default, dump shared memory if mapped from an anonymous file. */
        if (vm_flags & VM_SHARED) {

		if (ggt->get_inode_i_nlink(vm_file)
		    ? is_filtered(GCORE_DUMPFILTER_MAPPED_SHARED)
		    : is_filtered(GCORE_DUMPFILTER_ANON_SHARED))
			goto whole;

		goto nothing;
        }

        /* Dump segments that have been written to.  */
        if (anon_vma && is_filtered(GCORE_DUMPFILTER_ANON_PRIVATE))
                goto whole;
        if (!vm_file)
		goto nothing;

        if (is_filtered(GCORE_DUMPFILTER_MAPPED_PRIVATE))
                goto whole;

        /*
         * If this looks like the beginning of a DSO or executable mapping,
         * check for an ELF header.  If we find one, dump the first page to
         * aid in determining what was mapped here.
         */
        if (is_filtered(GCORE_DUMPFILTER_ELF_HEADERS) &&
            vm_pgoff == 0 && (vm_flags & VM_READ)) {
		ulong header = vm_start;
		uint32_t word = 0;
                /*
                 * Doing it this way gets the constant folded by GCC.
                 */
                union {
                        uint32_t cmp;
                        char elfmag[SELFMAG];
                } magic;
                magic.elfmag[EI_MAG0] = ELFMAG0;
                magic.elfmag[EI_MAG1] = ELFMAG1;
                magic.elfmag[EI_MAG2] = ELFMAG2;
                magic.elfmag[EI_MAG3] = ELFMAG3;
		if (uvtop(CURRENT_CONTEXT(), header, &paddr, FALSE)) {
			readmem(paddr, PHYSADDR, &word, sizeof(magic.elfmag),
				"read ELF page", gcore_verbose_error_handle());
		} else {
			pagefaultf("page fault at %lx\n", header);
		}
                if (word == magic.cmp)
			goto pagesize;
        }

nothing:
        return 0;

whole:
        return vm_end - vm_start;

pagesize:
	return PAGE_SIZE;
}

#ifdef GCORE_TEST

char *gcore_dumpfilter_test(void)
{
	dumpfilter = 0UL;
	mu_assert("given filter level is too large",
		  !gcore_dumpfilter_set(GCORE_DUMPFILTER_MAX_LEVEL + 1));
	mu_assert("dumpfilter was updated given an invalid argument",
		  dumpfilter == 0UL);

	dumpfilter = 0UL;
	mu_assert("didn't return TRUE even if a valid argument was given",
		  gcore_dumpfilter_set(GCORE_DUMPFILTER_MAX_LEVEL));
	mu_assert("not set given valid argument",
		  dumpfilter == GCORE_DUMPFILTER_MAX_LEVEL);
	dumpfilter = GCORE_DUMPFILTER_DEFAULT;

	return NULL;
}

#endif
-------------- next part --------------
/* gcore_global_data.c -- core analysis suite
 *
 * Copyright (C) 2010 FUJITSU LIMITED
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */

#include <defs.h>
#include <gcore_defs.h>

static struct gcore_data gcore_data = {0, };
struct gcore_data *gcore = &gcore_data;

static struct gcore_coredump_table gcore_coredump_table = {0, };
struct gcore_coredump_table *ggt = &gcore_coredump_table;

struct gcore_offset_table gcore_offset_table = {0, };
struct gcore_size_table gcore_size_table = {0, };
-------------- next part --------------
/* regset.c -- core analysis suite
 *
 * Copyright (C) 2010 FUJITSU LIMITED
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */

#include "defs.h"
#include <gcore_defs.h>
#include <elf.h>

enum gcore_default_regset {
	REGSET_GENERAL,
};

static int genregs_get(struct task_context *target,
		       const struct user_regset *regset,
		       unsigned int size,
		       void *buf)
{
	readmem(machdep->get_stacktop(target->task) - SIZE(pt_regs), KVADDR,
		buf, size, "genregs_get: pt_regs", gcore_verbose_error_handle());

	return TRUE;
}

#define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]))

static struct user_regset gcore_default_regsets[] = {
	[REGSET_GENERAL] = {
		.core_note_type = NT_PRSTATUS,
		.get = genregs_get
	},
};

static struct user_regset_view gcore_default_regset_view = {
	.name = REGSET_VIEW_NAME,
	.regsets = gcore_default_regsets,
	.n = ARRAY_SIZE(gcore_default_regsets),
	.e_machine = REGSET_VIEW_MACHINE
};

const struct user_regset_view * __attribute__((weak))
task_user_regset_view(void)
{
	return &gcore_default_regset_view;
}

void gcore_default_regsets_init(void)
{
	gcore_default_regsets[REGSET_GENERAL].size = SIZE(pt_regs);
}
-------------- next part --------------
/* gcore_verbose.c -- core analysis suite
 *
 * Copyright (C) 2010 FUJITSU LIMITED
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */

#include "defs.h"
#include <gcore_defs.h>

struct gcore_verbose_data
{
	ulong level;
	ulong error_handle;
};

static struct gcore_verbose_data gcore_verbose_data = { 0 };
static struct gcore_verbose_data *gvd = &gcore_verbose_data;

void gcore_verbose_set_default(void)
{
	gvd->level = VERBOSE_DEFAULT_LEVEL;
	gvd->error_handle = VERBOSE_DEFAULT_ERROR_HANDLE;
}

int gcore_verbose_set(ulong level)
{
	if (level >= VERBOSE_MAX_LEVEL)
		return FALSE;
	gvd->level = level;
	if (gvd->level & VERBOSE_NONQUIET)
		gvd->error_handle &= ~QUIET;
	else
		gvd->error_handle |= QUIET;
	return TRUE;
}

ulong gcore_verbose_get(void)
{
	return gvd->level;
}

ulong gcore_verbose_error_handle(void)
{
	return gvd->error_handle;
}
-------------- next part --------------
/* x86.c -- core analysis suite
 *
 * Copyright (C) 2010 FUJITSU LIMITED
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 */
#if defined(X86) || defined(X86_64)

#include "defs.h"
#ifdef X86_64
#include "unwind_x86_64.h"
#endif
#include <gcore_defs.h>
#include <stdint.h>
#include <elf.h>
#include <asm/ldt.h>

struct gcore_x86_table
{
#ifdef X86_64
	ulong (*get_old_rsp)(int cpu);
#endif
	ulong (*get_thread_struct_fpu)(struct task_context *tc);
	ulong (*get_thread_struct_fpu_size)(void);
#ifdef X86_64
	int (*is_special_syscall)(int nr_syscall);
	int (*is_special_ia32_syscall)(int nr_syscall);
#endif
	int (*tsk_used_math)(ulong task);
};

static struct gcore_x86_table gcore_x86_table;
struct gcore_x86_table *gxt = &gcore_x86_table;

#ifdef X86_64
static ulong gcore_x86_64_get_old_rsp(int cpu);
static ulong gcore_x86_64_get_per_cpu__old_rsp(int cpu);
static ulong gcore_x86_64_get_cpu_pda_oldrsp(int cpu);
static ulong gcore_x86_64_get_cpu__pda_oldrsp(int cpu);
#endif

static ulong gcore_x86_get_thread_struct_fpu_thread_xstate(struct task_context *tc);
static ulong gcore_x86_get_thread_struct_fpu_thread_xstate_size(void);
static ulong gcore_x86_get_thread_struct_thread_xstate(struct task_context *tc);
static ulong gcore_x86_get_thread_struct_thread_xstate_size(void);
static ulong gcore_x86_get_thread_struct_i387(struct task_context *tc);
static ulong gcore_x86_get_thread_struct_i387_size(void);

#ifdef X86_64
static void gcore_x86_table_register_get_old_rsp(void);
#endif
static void gcore_x86_table_register_get_thread_struct_fpu(void);
#ifdef X86_64
static void gcore_x86_table_register_is_special_syscall(void);
static void gcore_x86_table_register_is_special_ia32_syscall(void);
#endif
static void gcore_x86_table_register_tsk_used_math(void);

#ifdef X86_64
static int is_special_syscall_v0(int nr_syscall);
static int is_special_syscall_v26(int nr_syscall);
#endif

static int test_bit(unsigned int nr, const ulong addr);

#ifdef X86_64
static int is_ia32_syscall_enabled(void);
static int is_special_ia32_syscall_v0(int nr_syscall);
static int is_special_ia32_syscall_v26(int nr_syscall);
#endif

static int tsk_used_math_v0(ulong task);
static int tsk_used_math_v11(ulong task);

#ifdef X86_64
static void gcore_x86_64_regset_xstate_init(void);
#endif

#ifdef X86
static int genregs_get32(struct task_context *target,
			 const struct user_regset *regset, unsigned int size,
			 void *buf);
static void gcore_x86_32_regset_xstate_init(void);
#endif

static int get_xstate_regsets_number(void);

enum gcore_regset {
	REGSET_GENERAL,
	REGSET_FP,
	REGSET_XFP,
	REGSET_XSTATE,
	REGSET_IOPERM64,
	REGSET_TLS,
	REGSET_IOPERM32,
};

#define NT_386_TLS      0x200           /* i386 TLS slots (struct user_desc) */
#ifndef NT_386_IOPERM
#define NT_386_IOPERM	0x201		/* x86 io permission bitmap (1=deny) */
#endif
#define NT_X86_XSTATE   0x202           /* x86 extended state using xsave */
#define NT_PRXFPREG     0x46e62b7f      /* copied from gdb5.1/include/elf/common.h */

#define USER_XSTATE_FX_SW_WORDS 6

#define MXCSR_DEFAULT           0x1f80

#ifdef X86_64
/* This matches the 64bit FXSAVE format as defined by AMD. It is the same
   as the 32bit format defined by Intel, except that the selector:offset pairs for
   data and eip are replaced with flat 64bit pointers. */ 
struct user_i387_struct {
	unsigned short	cwd;
	unsigned short	swd;
	unsigned short	twd; /* Note this is not the same as the 32bit/x87/FSAVE twd */
	unsigned short	fop;
	uint64_t	rip;
	uint64_t	rdp;
	uint32_t	mxcsr;
	uint32_t	mxcsr_mask;
	uint32_t	st_space[32];	/* 8*16 bytes for each FP-reg = 128 bytes */
	uint32_t	xmm_space[64];	/* 16*16 bytes for each XMM-reg = 256 bytes */
	uint32_t	padding[24];
};
#endif

struct user_i387_ia32_struct {
	uint32_t	cwd;
	uint32_t	swd;
	uint32_t	twd;
	uint32_t	fip;
	uint32_t	fcs;
	uint32_t	foo;
	uint32_t	fos;
	uint32_t	st_space[20];   /* 8*10 bytes for each FP-reg = 80 bytes */
};

struct user32_fxsr_struct {
	unsigned short	cwd;
	unsigned short	swd;
	unsigned short	twd;	/* not compatible to 64bit twd */
	unsigned short	fop;
	int	fip;
	int	fcs;
	int	foo;
	int	fos;
	int	mxcsr;
	int	reserved;
	int	st_space[32];	/* 8*16 bytes for each FP-reg = 128 bytes */
	int	xmm_space[32];	/* 8*16 bytes for each XMM-reg = 128 bytes */
	int	padding[56];
};

struct i387_fsave_struct {
        uint32_t                     cwd;    /* FPU Control Word             */
        uint32_t                     swd;    /* FPU Status Word              */
        uint32_t                     twd;    /* FPU Tag Word                 */
        uint32_t                     fip;    /* FPU IP Offset                */
        uint32_t                     fcs;    /* FPU IP Selector              */
        uint32_t                     foo;    /* FPU Operand Pointer Offset   */
        uint32_t                     fos;    /* FPU Operand Pointer Selector */

        /* 8*10 bytes for each FP-reg = 80 bytes:                       */
        uint32_t                     st_space[20];

        /* Software status information [not touched by FSAVE ]:         */
        uint32_t                     status;
};

struct i387_fxsave_struct {
        uint16_t                     cwd; /* Control Word                    */
        uint16_t                     swd; /* Status Word                     */
        uint16_t                     twd; /* Tag Word                        */
        uint16_t                     fop; /* Last Instruction Opcode         */
        union {
                struct {
                        uint64_t     rip; /* Instruction Pointer             */
                        uint64_t     rdp; /* Data Pointer                    */
                };
                struct {
                        uint32_t     fip; /* FPU IP Offset                   */
                        uint32_t     fcs; /* FPU IP Selector                 */
                        uint32_t     foo; /* FPU Operand Offset              */
                        uint32_t     fos; /* FPU Operand Selector            */
                };
        };
        uint32_t                     mxcsr;          /* MXCSR Register State */
        uint32_t                     mxcsr_mask;     /* MXCSR Mask           */

        /* 8*16 bytes for each FP-reg = 128 bytes:                      */
        uint32_t                     st_space[32];

        /* 16*16 bytes for each XMM-reg = 256 bytes:                    */
        uint32_t                     xmm_space[64];

        uint32_t                     padding[12];

        union {
                uint32_t             padding1[12];
                uint32_t             sw_reserved[12];
        };

} __attribute__((aligned(16)));

struct i387_soft_struct {
        uint32_t                     cwd;
        uint32_t                     swd;
        uint32_t                     twd;
        uint32_t                     fip;
        uint32_t                     fcs;
        uint32_t                     foo;
        uint32_t                     fos;
        /* 8*10 bytes for each FP-reg = 80 bytes: */
        uint32_t                     st_space[20];
        uint8_t                      ftop;
        uint8_t                      changed;
        uint8_t                      lookahead;
        uint8_t                      no_update;
        uint8_t                      rm;
        uint8_t                      alimit;
        struct math_emu_info    *info;
        uint32_t                     entry_eip;
};

struct ymmh_struct {
        /* 16 * 16 bytes for each YMMH-reg = 256 bytes */
        uint32_t ymmh_space[64];
};

struct xsave_hdr_struct {
        uint64_t xstate_bv;
        uint64_t reserved1[2];
        uint64_t reserved2[5];
} __attribute__((packed));

struct xsave_struct {
        struct i387_fxsave_struct i387;
        struct xsave_hdr_struct xsave_hdr;
        struct ymmh_struct ymmh;
        /* new processor state extensions will go here */
} __attribute__ ((packed, aligned (64)));

union thread_xstate {
        struct i387_fsave_struct        fsave;
        struct i387_fxsave_struct       fxsave;
        struct i387_soft_struct         soft;
        struct xsave_struct             xsave;
};

#define NCAPINTS	9	/* N 32-bit words worth of info */

#define X86_FEATURE_FXSR	(0*32+24) /* FXSAVE/FXRSTOR, CR4.OSFXSR */
#define X86_FEATURE_XSAVE       (4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */
#define X86_FEATURE_XSAVEOPT	(7*32+ 4) /* Optimized Xsave */

/*
 * Per process flags
 */
#define PF_USED_MATH    0x00002000      /* if unset the fpu must be initialized before use */

/*
 * Thread-synchronous status.
 *
 * This is different from the flags in that nobody else
 * ever touches our thread-synchronous status, so we don't
 * have to worry about atomic accesses.
 */
#define TS_USEDFPU		0x0001	/* FPU was used by this task
					   this quantum (SMP) */

static int
boot_cpu_has(int feature)
{
	uint32_t x86_capability[NCAPINTS];

	if (!symbol_exists("boot_cpu_data"))
		error(FATAL, "boot_cpu_data: symbol does not exist\n");

	readmem(symbol_value("boot_cpu_data") +
		GCORE_OFFSET(cpuinfo_x86_x86_capability), KVADDR,
		&x86_capability, sizeof(x86_capability),
		"boot_cpu_has: x86_capability",
		gcore_verbose_error_handle());

	return ((1UL << (feature % 32)) & x86_capability[feature / 32]) != 0;
}

static inline int
cpu_has_xsave(void)
{
	return boot_cpu_has(X86_FEATURE_XSAVE);
}

static inline int
cpu_has_xsaveopt(void)
{
	return boot_cpu_has(X86_FEATURE_XSAVEOPT);
}

static inline int
cpu_has_fxsr(void)
{
	return boot_cpu_has(X86_FEATURE_FXSR);
}

static int
task_used_fpu(ulong task)
{
	uint32_t status;

	readmem(task_to_context(task)->thread_info +
		GCORE_OFFSET(thread_info_status), KVADDR, &status,
		sizeof(uint32_t), "task_used_fpu: status",
		gcore_verbose_error_handle());

	return !!(status & TS_USEDFPU);
}

static void
init_fpu(ulong task)
{
	if (gxt->tsk_used_math(task) && is_task_active(task)
	    && task_used_fpu(task)) {
		/*
		 * The FPU values contained within thread->xstate may
		 * differ from what was contained at crash timing, but
		 * crash dump cannot restore the runtime FPU state,
		 * here I only warn that.
		 */
		error(WARNING, "FPU may be inaccurate: %d\n",
		      task_to_pid(task));
        }
}

static int
xfpregs_active(struct task_context *target,
	       const struct user_regset *regset)
{
	return gxt->tsk_used_math(target->task);
}

static int xfpregs_get(struct task_context *target,
		       const struct user_regset *regset,
		       unsigned int size,
		       void *buf)
{
	struct i387_fxsave_struct *fxsave = (struct i387_fxsave_struct *)buf;
	union thread_xstate xstate;

	readmem(gxt->get_thread_struct_fpu(target), KVADDR, &xstate,
		gxt->get_thread_struct_fpu_size(),
		"xfpregs_get: xstate", gcore_verbose_error_handle());
	memcpy(buf, &xstate.fsave, sizeof(xstate.fsave));

	init_fpu(target->task);

	*fxsave = xstate.fxsave;

	return TRUE;
}

static void xfpregs_callback(struct elf_thread_core_info *t,
			    const struct user_regset *regset)
{
	t->prstatus.pr_fpvalid = 1;
}

static inline int
fpregs_active(struct task_context *target,
	      const struct user_regset *regset)
{
	return !!gxt->tsk_used_math(target->task);
}

#ifdef X86
static void sanitize_i387_state(struct task_context *target)
{
	if (cpu_has_xsaveopt()) {
		/*
		 * I have yet to implement here since I don't have
		 * CPUes that supports XSAVEOPT instruction.
		 */
	}
}
#endif

#ifdef X86_64
static inline int have_hwfp(void)
{
	return TRUE;
}
#endif

#ifdef X86
/*
 * CONFIG_MATH_EMULATION is set iff there's no math_emulate().
 */
static int is_set_config_math_emulation(void)
{
	return !symbol_exists("math_emulate");
}

static int have_hwfp(void)
{
	char hard_math;

	if (!is_set_config_math_emulation())
		return TRUE;

	readmem(symbol_value("cpuinfo_x86") + GCORE_OFFSET(cpuinfo_x86_hard_math),
		KVADDR, &hard_math, sizeof(hard_math), "have_hwfp: hard_math",
		gcore_verbose_error_handle());

	return hard_math ? TRUE : FALSE;
}

static int fpregs_soft_get(struct task_context *target,
			   const struct user_regset *regset,
			   unsigned int size,
			   void *buf)
{
	error(WARNING, "not support FPU software emulation\n");
	return TRUE;
}

static inline struct _fpxreg *
fpreg_addr(struct i387_fxsave_struct *fxsave, int n)
{
	return (void *)&fxsave->st_space + n * 16;
}

static inline uint32_t
twd_fxsr_to_i387(struct i387_fxsave_struct *fxsave)
{
	struct _fpxreg *st;
	uint32_t tos = (fxsave->swd >> 11) & 7;
	uint32_t twd = (unsigned long) fxsave->twd;
	enum {
		FP_EXP_TAG_VALID=0,
		FP_EXP_TAG_ZERO,
		FP_EXP_TAG_SPECIAL,
		FP_EXP_TAG_EMPTY,
	} tag;
	uint32_t ret = 0xffff0000u;
	int i;

	for (i = 0; i < 8; i++, twd >>= 1) {
		if (twd & 0x1) {
			st = fpreg_addr(fxsave, (i - tos) & 7);

			switch (st->exponent & 0x7fff) {
			case 0x7fff:
				tag = FP_EXP_TAG_SPECIAL;
				break;
			case 0x0000:
				if (!st->significand[0] &&
				    !st->significand[1] &&
				    !st->significand[2] &&
				    !st->significand[3])
					tag = FP_EXP_TAG_ZERO;
				else
					tag = FP_EXP_TAG_SPECIAL;
				break;
			default:
				if (st->significand[3] & 0x8000)
					tag = FP_EXP_TAG_VALID;
				else
					tag = FP_EXP_TAG_SPECIAL;
				break;
			}
		} else {
			tag = FP_EXP_TAG_EMPTY;
		}
		ret |= (uint32_t)tag << (2 * i);
	}
	return ret;
}

static void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_context *target)
{
	union thread_xstate xstate;
	struct _fpreg *to;
	struct _fpxreg *from;
	int i;

	readmem(gxt->get_thread_struct_fpu(target), KVADDR, &xstate,
		gxt->get_thread_struct_fpu_size(), "convert_from_fxsr: xstate",
		gcore_verbose_error_handle());

	to = (struct _fpreg *) &env->st_space[0];
	from = (struct _fpxreg *) &xstate.fxsave.st_space[0];

	env->cwd = xstate.fxsave.cwd | 0xffff0000u;
	env->swd = xstate.fxsave.swd | 0xffff0000u;
	env->twd = twd_fxsr_to_i387(&xstate.fxsave);

	if (STREQ(pc->machine_type, "X86_64")) {
		env->fip = xstate.fxsave.rip;
		env->foo = xstate.fxsave.rdp;
		if (is_task_active(target->task)) {
			error(WARNING, "cannot restore runtime fos and fcs\n");
		} else {
			struct user_regs_struct regs;
			uint16_t ds;

			readmem(machdep->get_stacktop(target->task) - SIZE(pt_regs),
				KVADDR,	&regs, sizeof(regs),
				"convert_from_fxsr: regs",
				gcore_verbose_error_handle());

			readmem(target->task + OFFSET(task_struct_thread)
				+ GCORE_OFFSET(thread_struct_ds), KVADDR, &ds,
				sizeof(ds), "convert_from_fxsr: ds",
				gcore_verbose_error_handle());

			env->fos = 0xffff0000 | ds;
			env->fcs = regs.cs;
		}
	} else { /* X86 */
		env->fip = xstate.fxsave.fip;
		env->fcs = (uint16_t) xstate.fxsave.fcs | ((uint32_t) xstate.fxsave.fop << 16);
		env->foo = xstate.fxsave.foo;
		env->fos = xstate.fxsave.fos;
	}

	for (i = 0; i < 8; ++i)
		memcpy(&to[i], &from[i], sizeof(to[0]));
}

static int fpregs_get(struct task_context *target,
		      const struct user_regset *regset,
		      unsigned int size,
		      void *buf)
{
	union thread_xstate xstate;

	init_fpu(target->task);

	if (!have_hwfp())
		return fpregs_soft_get(target, regset, size, buf);

	if (!cpu_has_fxsr()) {
		readmem(gxt->get_thread_struct_fpu(target), KVADDR, &xstate,
			gxt->get_thread_struct_fpu_size(),
			"fpregs_get: xstate", gcore_verbose_error_handle());
		memcpy(buf, &xstate.fsave, sizeof(xstate.fsave));
		return TRUE;
	}

	sanitize_i387_state(target);

	convert_from_fxsr(buf, target);

        return TRUE;
}
#endif

static ulong gcore_x86_get_thread_struct_fpu_thread_xstate(struct task_context *tc)
{
	ulong state;

	readmem(tc->task + OFFSET(task_struct_thread)
		+ GCORE_OFFSET(thread_struct_fpu) + GCORE_OFFSET(fpu_state),
		KVADDR, &state, sizeof(state),
		"gcore_x86_get_thread_struct_fpu_thread_xstate: state",
		gcore_verbose_error_handle());

	return state;
}

static ulong gcore_x86_get_thread_struct_fpu_thread_xstate_size(void)
{
	return GCORE_SIZE(thread_xstate);
}

static ulong gcore_x86_get_thread_struct_thread_xstate(struct task_context *tc)
{
	ulong xstate;

	readmem(tc->task + OFFSET(task_struct_thread)
		+ GCORE_OFFSET(thread_struct_xstate), KVADDR, &xstate,
		sizeof(xstate),
		"gcore_x86_get_thread_struct_thread_xstate: xstate",
		gcore_verbose_error_handle());

	return xstate;
}

static ulong gcore_x86_get_thread_struct_thread_xstate_size(void)
{
	return GCORE_SIZE(thread_xstate);
}

static ulong gcore_x86_get_thread_struct_i387(struct task_context *tc)
{
	return tc->task + OFFSET(task_struct_thread)
		+ GCORE_OFFSET(thread_struct_i387);
}

static ulong gcore_x86_get_thread_struct_i387_size(void)
{
	return GCORE_SIZE(i387_union);
}

/*
 * For an entry for REGSET_XSTATE both on x86 and x86_64, member n is
 * initiliazed dinamically at boot time.
 */
static int get_xstate_regsets_number(void)
{
	struct datatype_member datatype_member, *dm;
	ulong x86_64_regsets_xstate;
	unsigned int n;

	if (!symbol_exists("REGSET_XSTATE"))
		return 0;

	dm = &datatype_member;

	if (!arg_to_datatype("REGSET_XSTATE", dm, RETURN_ON_ERROR))
		return 0;

	x86_64_regsets_xstate = symbol_value("x86_64_regsets") +
		dm->value * STRUCT_SIZE("user_regset");

	readmem(x86_64_regsets_xstate + GCORE_OFFSET(user_regset_n),
		KVADDR, &n, sizeof(n), "fpregs_active: n", FAULT_ON_ERROR);

	return n;
}

static inline int
xstateregs_active(struct task_context *target,
		  const struct user_regset *regset)
{
	return cpu_has_xsave() && fpregs_active(target, regset)
		&& !!get_xstate_regsets_number();
}

static int
xstateregs_get(struct task_context *target,
	       const struct user_regset *regset,
	       unsigned int size,
	       void *buf)
{
	union thread_xstate *xstate = (union thread_xstate *)buf;
	ulong xstate_fx_sw_bytes;

	readmem(target->task + OFFSET(task_struct_thread) +
		GCORE_OFFSET(thread_struct_xstate), KVADDR, xstate,
		sizeof(union thread_xstate), "xstateregs_get: thread",
		gcore_verbose_error_handle());

        init_fpu(target->task);

	if (!symbol_exists("xstate_fx_sw_bytes"))
		error(FATAL, "xstate_fx_sw_bytes: symbol does not exist\n");

	xstate_fx_sw_bytes = symbol_value("xstate_fx_sw_bytes");

        /*
         * Copy the 48bytes defined by the software first into the xstate
         * memory layout in the thread struct, so that we can copy the entire
         * xstateregs to the user using one user_regset_copyout().
         */
	readmem(xstate_fx_sw_bytes, KVADDR, &xstate->fxsave.sw_reserved,
		USER_XSTATE_FX_SW_WORDS * sizeof(uint64_t),
		"fill_xstate: sw_reserved", gcore_verbose_error_handle());

	return TRUE;
}

#ifdef X86_64
/*
 * we cannot use the same code segment descriptor for user and kernel
 * -- not even in the long flat mode, because of different DPL /kkeil
 * The segment offset needs to contain a RPL. Grr. -AK
 * GDT layout to get 64bit syscall right (sysret hardcodes gdt offsets)
 */
#define GDT_ENTRY_TLS_MIN 12
#endif

#ifdef X86
#define GDT_ENTRY_TLS_MIN 6
#endif

#define GDT_ENTRY_TLS_ENTRIES 3

/* TLS indexes for 64bit - hardcoded in arch_prctl */
#define FS_TLS 0
#define GS_TLS 1

#define GS_TLS_SEL ((GDT_ENTRY_TLS_MIN+GS_TLS)*8 + 3)
#define FS_TLS_SEL ((GDT_ENTRY_TLS_MIN+FS_TLS)*8 + 3)

/*
 * EFLAGS bits
 */
#define X86_EFLAGS_TF   0x00000100 /* Trap Flag */

#ifdef X86_64
#define __USER_CS       0x23
#define __USER_DS       0x2B
#endif

/*
 * thread information flags
 * - these are process state flags that various assembly files
 *   may need to access
 * - pending work-to-be-done flags are in LSW
 * - other flags in MSW
 * Warning: layout of LSW is hardcoded in entry.S
 */
#define TIF_FORCED_TF           24      /* true if TF in eflags artificially */

#ifdef X86
struct desc_struct {
	uint16_t limit0;
	uint16_t base0;
	unsigned int base1: 8, type: 4, s: 1, dpl: 2, p: 1;
	unsigned int limit: 4, avl: 1, l: 1, d: 1, g: 1, base2: 8;
} __attribute__((packed));

static inline ulong get_desc_base(const struct desc_struct *desc)
{
	return (ulong)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) << 24));
}

static inline ulong get_desc_limit(const struct desc_struct *desc)
{
	return desc->limit0 | (desc->limit << 16);
}

static inline int desc_empty(const void *ptr)
{
	const uint32_t *desc = ptr;
	return !(desc[0] | desc[1]);
}

static void fill_user_desc(struct user_desc *info, int idx,
			   struct desc_struct *desc)

{
	memset(info, 0, sizeof(*info));
	info->entry_number = idx;
	info->base_addr = get_desc_base(desc);
	info->limit = get_desc_limit(desc);
	info->seg_32bit = desc->d;
	info->contents = desc->type >> 2;
	info->read_exec_only = !(desc->type & 2);
	info->limit_in_pages = desc->g;
	info->seg_not_present = !desc->p;
	info->useable = desc->avl;
}

static int regset_tls_active(struct task_context *target,
			     const struct user_regset *regset)
{
	int i, nr_entries;
	struct desc_struct *tls_array;

	nr_entries = GCORE_SIZE(thread_struct_tls_array) / sizeof(uint64_t);

	tls_array = (struct desc_struct *)GETBUF(GCORE_SIZE(thread_struct_tls_array));

	readmem(target->task + OFFSET(task_struct_thread)
		+ GCORE_OFFSET(thread_struct_tls_array), KVADDR,
		tls_array, GCORE_SIZE(thread_struct_tls_array),
		"regset_tls_active: t",
		gcore_verbose_error_handle());

	for (i = 0; i < nr_entries; ++i)
		if (!desc_empty(&tls_array[i]))
			return TRUE;

	return FALSE;
}

static int regset_tls_get(struct task_context *target,
			  const struct user_regset *regset,
			  unsigned int size,
			  void *buf)
{
	struct user_desc *info = (struct user_desc *)buf;
	int i, nr_entries;
	struct desc_struct *tls_array;

	nr_entries = GCORE_SIZE(thread_struct_tls_array) / sizeof(uint64_t);

	tls_array = (struct desc_struct *)GETBUF(GCORE_SIZE(thread_struct_tls_array));

	readmem(target->task + OFFSET(task_struct_thread)
		+ GCORE_OFFSET(thread_struct_tls_array), KVADDR,
		tls_array, GCORE_SIZE(thread_struct_tls_array),
		"regset_tls_active: tls_array",
		gcore_verbose_error_handle());

	for (i = 0; i < nr_entries; ++i) {
		fill_user_desc(&info[i], GDT_ENTRY_TLS_MIN + i, &tls_array[i]);
	}

	return TRUE;
}
#endif /* X86 */

#define IO_BITMAP_BITS  65536
#define IO_BITMAP_BYTES (IO_BITMAP_BITS/8)
#define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long))
#define IO_BITMAP_OFFSET offsetof(struct tss_struct,io_bitmap)

static int
ioperm_active(struct task_context *target,
	      const struct user_regset *regset)
{
	unsigned int io_bitmap_max;

	readmem(target->task + OFFSET(task_struct_thread) +
		GCORE_OFFSET(thread_struct_io_bitmap_max), KVADDR,
		&io_bitmap_max, sizeof(io_bitmap_max),
		"ioperm_active: io_bitmap_max", gcore_verbose_error_handle());

	return io_bitmap_max / regset->size;
}

static int ioperm_get(struct task_context *target,
		      const struct user_regset *regset,
		      unsigned int size,
		      void *buf)
{
	ulong io_bitmap_ptr;

	readmem(target->task + OFFSET(task_struct_thread) +
		GCORE_OFFSET(thread_struct_io_bitmap_ptr), KVADDR,
		&io_bitmap_ptr, sizeof(io_bitmap_ptr),
		"ioperm_get: io_bitmap_ptr", gcore_verbose_error_handle());

	if (!io_bitmap_ptr)
		return FALSE;

	readmem(io_bitmap_ptr, KVADDR, buf, size, "ioperm_get: copy IO bitmap",
		gcore_verbose_error_handle());

	return TRUE;
}

#ifdef X86_64
#define __NR_rt_sigreturn	 15
#define __NR_clone		 56
#define __NR_fork		 57
#define __NR_vfork		 58
#define __NR_execve		 59
#define __NR_iopl		172
#define __NR_rt_sigsuspend      130
#define __NR_sigaltstack	131

static int is_special_syscall_v26(int nr_syscall)
{
	return nr_syscall == __NR_fork
		|| nr_syscall == __NR_execve
		|| nr_syscall == __NR_iopl
		|| nr_syscall == __NR_clone
		|| nr_syscall == __NR_rt_sigreturn
		|| nr_syscall == __NR_sigaltstack
		|| nr_syscall == __NR_vfork;
}

static int is_special_syscall_v0(int nr_syscall)
{
	return is_special_syscall_v26(nr_syscall)
		|| nr_syscall == __NR_rt_sigsuspend;
}

#define IA32_SYSCALL_VECTOR 0x80

#define __KERNEL_CS 0x10
#endif

//extern struct gate_struct idt_table[]; 
enum { 
	GATE_INTERRUPT = 0xE, 
	GATE_TRAP = 0xF, 
	GATE_CALL = 0xC,
}; 

#ifdef X86_64
/* 16byte gate */
struct gate_struct64 {
        uint16_t offset_low;
	uint16_t segment;
	unsigned int ist : 3, zero0 : 5, type : 5, dpl : 2, p : 1;
	uint16_t offset_middle;
	uint32_t offset_high;
	uint32_t zero1;
} __attribute__((packed));
#endif

#define PTR_LOW(x) ((unsigned long)(x) & 0xFFFF) 
#define PTR_MIDDLE(x) (((unsigned long)(x) >> 16) & 0xFFFF)
#define PTR_HIGH(x) ((unsigned long)(x) >> 32)

#ifdef X86_64
/*
 * compare gate structure data in crash kernel directly with the
 * expected data in order to check wheather IA32_EMULATION feature was
 * set or not.
 *
 * To check only wheather the space is filled with 0 or not is an
 * alternate way to acheve the same purpose, but here I don't do so.
 */
static int is_gate_set_ia32_syscall_vector(void)
{
	struct gate_struct64 gate, gate_idt;
	const ulong ia32_syscall_entry = symbol_value("ia32_syscall");

	gate.offset_low = PTR_LOW(ia32_syscall_entry);
	gate.segment = __KERNEL_CS;
	gate.ist = 0;
	gate.p = 1;
	gate.dpl = 0x3;
	gate.zero0 = 0;
	gate.zero1 = 0;
	gate.type = GATE_INTERRUPT;
	gate.offset_middle = PTR_MIDDLE(ia32_syscall_entry);
	gate.offset_high = PTR_HIGH(ia32_syscall_entry);

	readmem(symbol_value("idt_table") + 16 * IA32_SYSCALL_VECTOR, KVADDR,
		&gate_idt, sizeof(gate_idt), "is_gate_set_ia32_syscall_vector:"
		" idt_table[IA32_SYSCALL_VECTOR", gcore_verbose_error_handle());

	return !memcmp(&gate, &gate_idt, sizeof(struct gate_struct64));
}

#define IA32_SYSCALL_VECTOR          0x80

#define __NR_ia32_fork               2
#define __NR_ia32_execve             11
#define __NR_ia32_sigsuspend         72
#define __NR_ia32_iopl               110
#define __NR_ia32_sigreturn          119
#define __NR_ia32_clone              120
#define __NR_ia32_sys32_rt_sigreturn 173
#define __NR_ia32_rt_sigsuspend      179
#define __NR_ia32_sigaltstack        186
#define __NR_ia32_vfork              190

/*
 * is_special_ia32_syscall() field is initialized only when
 * IA32_SYSCALL_VECTOR(0x80) is set to used_vectors. This check is
 * made in gcore_x86_table_init().
 */
static inline int is_ia32_syscall_enabled(void)
{
	return !!gxt->is_special_ia32_syscall;
}

static int is_special_ia32_syscall_v0(int nr_syscall)
{
	return is_special_ia32_syscall_v26(nr_syscall)
		|| nr_syscall == __NR_ia32_sigsuspend
		|| nr_syscall == __NR_ia32_rt_sigsuspend;
}

static int is_special_ia32_syscall_v26(int nr_syscall)
{
	return nr_syscall == __NR_ia32_fork
		|| nr_syscall == __NR_ia32_sigreturn
		|| nr_syscall == __NR_ia32_execve
		|| nr_syscall == __NR_ia32_iopl
		|| nr_syscall == __NR_ia32_clone
		|| nr_syscall == __NR_ia32_sys32_rt_sigreturn
		|| nr_syscall == __NR_ia32_sigaltstack
		|| nr_syscall == __NR_ia32_vfork;
}
#endif /* X86_64 */

static int tsk_used_math_v0(ulong task)
{
	unsigned short used_math;

	readmem(task + GCORE_OFFSET(task_struct_used_math), KVADDR,
		&used_math, sizeof(used_math), "tsk_used_math_v0: used_math",
		gcore_verbose_error_handle());

	return !!used_math;
}

static int tsk_used_math_v11(ulong task)
{
	unsigned long flags;

	readmem(task + OFFSET(task_struct_flags), KVADDR, &flags,
		sizeof(flags), "tsk_used_math_v11: flags",
		gcore_verbose_error_handle());

	return !!(flags & PF_USED_MATH);
}

static inline int
user_mode(const struct user_regs_struct *regs)
{
	return !!(regs->cs & 0x3);
}

#ifdef X86_64
static int
get_desc_base(ulong desc)
{
	uint16_t base0;
	uint8_t base1, base2;

	readmem(desc + GCORE_OFFSET(desc_struct_base0), KVADDR, &base0,
		sizeof(base0), "get_desc_base: base0", gcore_verbose_error_handle());

	readmem(desc + GCORE_OFFSET(desc_struct_base1), KVADDR, &base1,
		sizeof(base1), "get_desc_base: base1", gcore_verbose_error_handle());

	readmem(desc + GCORE_OFFSET(desc_struct_base2), KVADDR, &base2,
		sizeof(base2), "get_desc_base: base2", gcore_verbose_error_handle());

	return base0 | (base1 << 16) | (base2 << 24);
}

static int
test_tsk_thread_flag(ulong task, int bit)
{
	ulong thread_info, flags;

	thread_info = task_to_thread_info(task);

	readmem(thread_info + OFFSET(thread_info_flags), KVADDR, &flags,
		sizeof(flags), "test_tsk_thread_flag: flags",
		gcore_verbose_error_handle());

	return !!((1UL << bit) & flags);
}

static void
restore_segment_registers(ulong task, struct user_regs_struct *regs)
{
	readmem(task + OFFSET(task_struct_thread) +
		GCORE_OFFSET(thread_struct_fs), KVADDR, &regs->fs_base,
		GCORE_SIZE(thread_struct_fs),
		"restore_segment_registers: fs", gcore_verbose_error_handle());

	if (!regs->fs_base) {

		readmem(task + OFFSET(task_struct_thread) +
			GCORE_OFFSET(thread_struct_fsindex), KVADDR,
			&regs->fs_base, GCORE_SIZE(thread_struct_fsindex),
			"restore_segment_registers: fsindex",
			gcore_verbose_error_handle());

		regs->fs_base =
			regs->fs_base != FS_TLS_SEL
			? 0
			: get_desc_base(task + OFFSET(task_struct_thread) +
					FS_TLS * SIZE(desc_struct));

	}

	readmem(task + OFFSET(task_struct_thread) +
		GCORE_OFFSET(thread_struct_gsindex), KVADDR, &regs->gs_base,
		GCORE_SIZE(thread_struct_gsindex),
		"restore_segment_registers: gsindex", gcore_verbose_error_handle());

	if (!regs->gs_base) {

		readmem(task + OFFSET(task_struct_thread) +
			GCORE_OFFSET(thread_struct_gs), KVADDR,	&regs->gs_base,
			GCORE_SIZE(thread_struct_gs),
			"restore_segment_registers: gs", gcore_verbose_error_handle());

		regs->gs_base =
			regs->gs_base != GS_TLS_SEL
			? 0
			: get_desc_base(task + OFFSET(task_struct_thread) +
					GS_TLS * SIZE(desc_struct));

	}

	if (test_tsk_thread_flag(task, TIF_FORCED_TF))
		regs->flags &= ~X86_EFLAGS_TF;

	readmem(task + OFFSET(task_struct_thread) +
		GCORE_OFFSET(thread_struct_fsindex), KVADDR, &regs->fs,
		sizeof(regs->fs), "restore_segment_registers: fsindex",
		gcore_verbose_error_handle());

	readmem(task + OFFSET(task_struct_thread) +
		GCORE_OFFSET(thread_struct_gsindex), KVADDR, &regs->gs,
		sizeof(regs->gs), "restore_segment_registers: gsindex",
		gcore_verbose_error_handle());

	readmem(task + OFFSET(task_struct_thread) +
		GCORE_OFFSET(thread_struct_es), KVADDR, &regs->es,
		sizeof(regs->es), "restore_segment_registers: es",
		gcore_verbose_error_handle());

}

/**
 * restore_frame_pointer - restore user-mode frame pointer
 *
 * @task interesting task
 *
 * If the kernel is built with CONFIG_FRAME_POINTER=y, we can find a
 * user-mode frame pointer by tracing frame pointers from the one
 * saved at scheduler. The reasons why this is possible include the
 * fact that entry_64.S doesn't touch any callee-saved registers
 * including frame pointer, rbp.
 *
 * On the other hand, if the kernel is not built with
 * CONFIG_FRAME_POINTER=y, we need to depend on CFA information
 * provided by kernel debugging information.
 */
static ulong restore_frame_pointer(ulong task)
{
	ulong rsp, rbp;

	/*
	 * rsp is saved in task->thread.sp during switch_to().
	 */
	readmem(task + OFFSET(task_struct_thread) +
		OFFSET(thread_struct_rsp), KVADDR, &rsp, sizeof(rsp),
		"restore_frame_pointer: rsp", gcore_verbose_error_handle());

	/*
	 * rbp is saved at the point referred to by rsp
	 */
	readmem(rsp, KVADDR, &rbp, sizeof(rbp), "restore_frame_pointer: rbp",
		gcore_verbose_error_handle());

	/*
	 * resume to the last rbp in user-mode.
	 */
	while (IS_KVADDR(rbp))
		readmem(rbp, KVADDR, &rbp, sizeof(rbp),
			"restore_frame_pointer: resume rbp",
			gcore_verbose_error_handle());

	return rbp;
}

/**
 * restore_rest() - restore user-mode callee-saved registers
 *
 * @task interesting task object
 * @regs buffer into which register values are placed
 * @note_regs registers in NT_PRSTATUS saved at kernel crash
 *
 * SAVE_ARGS() doesn't save callee-saved registers: rbx, r12, r13, r14
 * and r15 because they are automatically saved at kernel stack frame
 * that is made by the first C function call from entry_64.S.
 *
 * To retrieve these values correctly, it is necessary to use CFA,
 * Cannonical Frame Address, which is specified as part of Dwarf, in
 * order to calculate accurate offsets to where individual register is
 * saved.
 *
 * note_regs is a starting point of backtracing for active tasks.
 *
 * There are two kinds of sections for CFA to be placed in ELF's
 * debugging inforamtion sections: .eh_frame and .debug_frame. The
 * point is that two sections have differnet layout. Look carefully at
 * is_ehframe.
 */
static inline void restore_rest(ulong task, struct pt_regs *regs,
				const struct user_regs_struct *note_regs)
{
	int first_frame;
	struct unwind_frame_info frame;
	const int is_ehframe = (!st->dwarf_debug_frame_size && st->dwarf_eh_frame_size);

	/*
	 * For active processes, all values at crash are available, so
	 * we pass them to unwinder as an initial frame value.
	 *
	 * For suspended processes when panic occurs, only ip, sp and
	 * bp values will be passed to unwind(), this seems enough for
	 * backtracing currently.
	 */
	if (is_task_active(task)) {
		memcpy(&frame.regs, note_regs, sizeof(struct pt_regs));
	} else {
		unsigned long rsp, rbp;

		memset(&frame.regs, 0, sizeof(struct pt_regs));

		readmem(task + OFFSET(task_struct_thread) +
			OFFSET(thread_struct_rsp), KVADDR, &rsp, sizeof(rsp),
			"restore_rest: rsp",
			gcore_verbose_error_handle());
		readmem(rsp, KVADDR, &rbp, sizeof(rbp), "restore_rest: rbp",
			gcore_verbose_error_handle());

		frame.regs.rip = machdep->machspec->thread_return;
		frame.regs.rsp = rsp;
		frame.regs.rbp = rbp;
	}

	/*
	 * Unwind to the first stack frame in kernel.
	 */
	first_frame = TRUE;

	while (!unwind(&frame, is_ehframe)) {
		if (first_frame)
			first_frame = FALSE;
	}

	if (!first_frame) {
		regs->r12 = frame.regs.r12;
		regs->r13 = frame.regs.r13;
		regs->r14 = frame.regs.r14;
		regs->r15 = frame.regs.r15;
		regs->rbp = frame.regs.rbp;
		regs->rbx = frame.regs.rbx;
	}

	/*
	 * If kernel was configured with CONFIG_FRAME_POINTER, we
	 * could trace the value of bp until its value became a
	 * user-space address. See comments of restore_frame_pointer.
	 */
	if (machdep->flags & FRAMEPOINTER) {
		regs->rbp = restore_frame_pointer(task);
	}
}

/**
 * gcore_x86_64_get_old_rsp() - get rsp at per-cpu area
 *
 * @cpu target CPU's CPU id
 *
 * Given a CPU id, returns a RSP value saved at per-cpu area for the
 * CPU whose id is the given CPU id.
 */
static ulong gcore_x86_64_get_old_rsp(int cpu)
{
	ulong old_rsp;

	readmem(symbol_value("old_rsp") + kt->__per_cpu_offset[cpu],
		KVADDR,	&old_rsp, sizeof(old_rsp),
		"gcore_x86_64_get_old_rsp: old_rsp",
		gcore_verbose_error_handle());

	return old_rsp;
}

/**
 * gcore_x86_64_get_per_cpu__old_rsp() - get rsp at per-cpu area
 *
 * @cpu target CPU's CPU id
 *
 * Given a CPU id, returns a RSP value saved at per-cpu area for the
 * CPU whose id is the given CPU id.
 */
static ulong gcore_x86_64_get_per_cpu__old_rsp(int cpu)
{
	ulong per_cpu__old_rsp;

	readmem(symbol_value("per_cpu__old_rsp") + kt->__per_cpu_offset[cpu],
		KVADDR,	&per_cpu__old_rsp, sizeof(per_cpu__old_rsp),
		"gcore_x86_64_get_per_cpu__old_rsp: per_cpu__old_rsp",
		gcore_verbose_error_handle());

	return per_cpu__old_rsp;
}

/**
 * gcore_x86_64_get_cpu_pda_oldrsp() - get rsp at per-cpu area
 *
 * @cpu target CPU's CPU id
 *
 * Given a CPU id, returns a RSP value saved at per-cpu area for the
 * CPU whose id is the given CPU id.
 */
static ulong gcore_x86_64_get_cpu_pda_oldrsp(int cpu)
{
	ulong oldrsp;
	char *cpu_pda_buf;

	cpu_pda_buf = GETBUF(SIZE(x8664_pda));

	readmem(symbol_value("cpu_pda") + sizeof(ulong) * SIZE(x8664_pda),
		KVADDR, cpu_pda_buf, SIZE(x8664_pda),
		"gcore_x86_64_get_cpu_pda_oldrsp: cpu_pda_buf",
		gcore_verbose_error_handle());

	oldrsp = ULONG(cpu_pda_buf + GCORE_OFFSET(x8664_pda_oldrsp));

	return oldrsp;
}

/**
 * gcore_x86_64_get_cpu__pda_oldrsp() - get rsp at per-cpu area
 *
 * @cpu target CPU's CPU id
 *
 * Given a CPU id, returns a RSP value saved at per-cpu area for the
 * CPU whose id is the given CPU id.
 */
static ulong gcore_x86_64_get_cpu__pda_oldrsp(int cpu)
{
	ulong oldrsp, x8664_pda, _cpu_pda;

	_cpu_pda = symbol_value("_cpu_pda");

	readmem(_cpu_pda + sizeof(ulong) * cpu, KVADDR, &x8664_pda,
		sizeof(x8664_pda),
		"gcore_x86_64_get__cpu_pda_oldrsp: _cpu_pda",
		gcore_verbose_error_handle());

	readmem(x8664_pda + GCORE_OFFSET(x8664_pda_oldrsp), KVADDR,
		&oldrsp, sizeof(oldrsp),
		"gcore_x86_64_get_cpu_pda_oldrsp: oldrsp",
		gcore_verbose_error_handle());

	return oldrsp;
}

static int genregs_get(struct task_context *target,
		       const struct user_regset *regset,
		       unsigned int size, void *buf)
{
	struct user_regs_struct *regs = (struct user_regs_struct *)buf;
	struct user_regs_struct note_regs;
	const int active = is_task_active(target->task);

	/*
	 * vmcore generated by kdump contains NT_PRSTATUS including
	 * general register values for active tasks.
	 */
	if (active && KDUMP_DUMPFILE()) {
		struct user_regs_struct *note_regs_p;

		note_regs_p = get_regs_from_elf_notes(CURRENT_CONTEXT());
		memcpy(&note_regs, note_regs_p, sizeof(struct user_regs_struct));

		/*
		 * If the task was in kernel-mode at the kernel crash, note
		 * information is not what we would like.
		 */
		if (user_mode(&note_regs)) {
			memcpy(regs, &note_regs, sizeof(struct user_regs_struct));
			return 0;
		}
	}

	/*
	 * SAVE_ARGS() and SAVE_ALL() macros save user-mode register
	 * values at kernel stack top when entering kernel-mode at
	 * interrupt.
	 */
	readmem(machdep->get_stacktop(target->task) - SIZE(pt_regs), KVADDR,
		regs, size, "genregs_get: pt_regs", gcore_verbose_error_handle());

	/*
	 * regs->orig_ax contains either a signal number or an IRQ
	 * number: if >=0, it's a signal number; if <0, it's an IRQ
	 * number.
	 */
	if ((int)regs->orig_ax >= 0) {
		const int nr_syscall = (int)regs->orig_ax;

		/*
		 * rsp is saved in per-CPU old_rsp, which is saved in
		 * thread->usersp at each context switch.
		 */
		if (active) {
			regs->sp = gxt->get_old_rsp(target->processor);
		} else {
			readmem(target->task + OFFSET(task_struct_thread) +
				GCORE_OFFSET(thread_struct_usersp), KVADDR, &regs->sp,
				sizeof(regs->sp),
				"genregs_get: usersp", gcore_verbose_error_handle());
		}

		/*
		 * entire registers are saved for special system calls.
		 */
		if (!gxt->is_special_syscall(nr_syscall))
			restore_rest(target->task, (struct pt_regs *)regs, &note_regs);

		/*
		 * See FIXUP_TOP_OF_STACK in arch/x86/kernel/entry_64.S.
		 */
		regs->ss = __USER_DS;
		regs->cs = __USER_CS;
		regs->cx = (ulong)-1;
		regs->flags = regs->r11;

		restore_segment_registers(target->task, regs);

	} else {
		const int vector = (int)~regs->orig_ax;

		if (vector < 0 || vector > 255) {
			error(WARNING, "unexpected IRQ number: %d.\n", vector);
		}

                /* Exceptions and NMI */
		else if (vector < 20) {
			restore_rest(target->task, (struct pt_regs *)regs,
				     &note_regs);
			restore_segment_registers(target->task, regs);
		}

                /* reserved by Intel */
		else if (vector < 32) {
			error(WARNING, "IRQ number %d is reserved by Intel\n",
			      vector);
		}

		/* system call invocation by int 0x80 */
		else if (vector == 0x80 && is_ia32_syscall_enabled()) {
			const int nr_syscall = regs->ax;

			if (!gxt->is_special_ia32_syscall(nr_syscall))
				restore_rest(target->task,
					     (struct pt_regs *)regs,
					     &note_regs);
			restore_segment_registers(target->task, regs);
		}

                /* Muskable Interrupts */
		else if (vector < 256) {
			restore_rest(target->task, (struct pt_regs *)regs,
				     &note_regs);
			restore_segment_registers(target->task, regs);
		}

	}

	return 0;
}
#endif /* X86_64 */

#ifndef ARRAY_SIZE
#  define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]))
#endif

static inline int test_bit(unsigned int nr, const ulong addr)
{
	ulong nth_entry;

	readmem(addr + (nr / 64) * sizeof(ulong), KVADDR, &nth_entry,
		sizeof(nth_entry), "test_bit: nth_entry", gcore_verbose_error_handle());

	return !!((1UL << (nr % 64)) & nth_entry);
}

#ifdef X86_64
static void gcore_x86_table_register_get_old_rsp(void)
{
	if (symbol_exists("old_rsp"))
		gxt->get_old_rsp = gcore_x86_64_get_old_rsp;

	else if (symbol_exists("per_cpu__old_rsp"))
		gxt->get_old_rsp = gcore_x86_64_get_per_cpu__old_rsp;

	else if (symbol_exists("cpu_pda"))
		gxt->get_old_rsp = gcore_x86_64_get_cpu_pda_oldrsp;

	else if (symbol_exists("_cpu_pda"))
		gxt->get_old_rsp = gcore_x86_64_get_cpu__pda_oldrsp;
}
#endif

static void gcore_x86_table_register_get_thread_struct_fpu(void)
{
	if (MEMBER_EXISTS("thread_struct", "fpu")) {
		gxt->get_thread_struct_fpu =
			gcore_x86_get_thread_struct_fpu_thread_xstate;
		gxt->get_thread_struct_fpu_size =
			gcore_x86_get_thread_struct_fpu_thread_xstate_size;
	} else if (MEMBER_EXISTS("thread_struct", "xstate")) {
		gxt->get_thread_struct_fpu =
			gcore_x86_get_thread_struct_thread_xstate;
		gxt->get_thread_struct_fpu_size =
			gcore_x86_get_thread_struct_thread_xstate_size;
	} else if (MEMBER_EXISTS("thread_struct", "i387")) {
		gxt->get_thread_struct_fpu =
			gcore_x86_get_thread_struct_i387;
		gxt->get_thread_struct_fpu_size =
			gcore_x86_get_thread_struct_i387_size;
	}
}

#ifdef X86_64
/*
 * Some special system calls got not special at v2.6.26.
 *
 * commit 5f0120b5786f5dbe097a946a2eb5d745ebc2b7ed
 */
static void gcore_x86_table_register_is_special_syscall(void)
{
	if (symbol_exists("stub_rt_sigsuspend"))
		gxt->is_special_syscall = is_special_syscall_v0;
	else
		gxt->is_special_syscall = is_special_syscall_v26;
}

/*
 * Some special system calls got not special at v2.6.26.
 *
 * commit 5f0120b5786f5dbe097a946a2eb5d745ebc2b7ed
 */
static void gcore_x86_table_register_is_special_ia32_syscall(void)
{
	if (symbol_exists("ia32_syscall") &&
	    ((symbol_exists("used_vectors") &&
	      test_bit(IA32_SYSCALL_VECTOR, symbol_value("used_vectors"))) ||
	     is_gate_set_ia32_syscall_vector())) {
		if (symbol_exists("stub32_rt_sigsuspend"))
			gxt->is_special_ia32_syscall =
				is_special_ia32_syscall_v0;
		else
			gxt->is_special_ia32_syscall =
				is_special_ia32_syscall_v26;
	}
}
#endif

/*
 * used_math member of task_struct structure was removed. Instead,
 * PF_USED_MATH was introduced and has been used now.
 *
 * Between 2.6.10 and 2.6.11.
 */
static void gcore_x86_table_register_tsk_used_math(void)
{
	if (GCORE_VALID_MEMBER(task_struct_used_math))
		gxt->tsk_used_math = tsk_used_math_v0;
	else
		gxt->tsk_used_math = tsk_used_math_v11;

}

#ifdef X86_64
void gcore_x86_table_init(void)
{
	gcore_x86_table_register_get_old_rsp();
	gcore_x86_table_register_get_thread_struct_fpu();
	gcore_x86_table_register_is_special_syscall();
	gcore_x86_table_register_is_special_ia32_syscall();
	gcore_x86_table_register_tsk_used_math();
}

static struct user_regset x86_64_regsets[] = {
	[REGSET_GENERAL] = {
		.core_note_type = NT_PRSTATUS,
		.size = sizeof(struct user_regs_struct),
		.get = genregs_get
	},
	[REGSET_FP] = {
		.core_note_type = NT_FPREGSET,
		.name = "LINUX",
		.size = sizeof(struct user_i387_struct),
		.active = xfpregs_active,
		.get = xfpregs_get,
                .callback = xfpregs_callback
	},
	[REGSET_XSTATE] = {
		.core_note_type = NT_X86_XSTATE,
		.name = "CORE",
		.size = sizeof(uint64_t),
		.active = xstateregs_active,
		.get = xstateregs_get,
	},
	[REGSET_IOPERM64] = {
		.core_note_type = NT_386_IOPERM,
		.name = "CORE",
		.size = IO_BITMAP_LONGS * sizeof(long),
		.active = ioperm_active,
		.get = ioperm_get
	},
};

static const struct user_regset_view x86_64_regset_view = {
	.name = "x86_64",
	.regsets = x86_64_regsets,
	.n = ARRAY_SIZE(x86_64_regsets),
	.e_machine = EM_X86_64,
};

/*
 * The number of registers for REGSET_XSTATE entry is specified
 * dynamically. So, we need to look at it directly.
 */
static void gcore_x86_64_regset_xstate_init(void)
{
	struct user_regset *regset_xstate = &x86_64_regsets[REGSET_XSTATE];

	regset_xstate->size = sizeof(uint64_t) * get_xstate_regsets_number();
}

void gcore_x86_64_regsets_init(void)
{
       	gcore_x86_64_regset_xstate_init();
}

#endif /* X86_64 */

#ifdef X86
static int genregs_get32(struct task_context *target,
			 const struct user_regset *regset,
			 unsigned int size, void *buf)
{
	struct user_regs_struct *regs = (struct user_regs_struct *)buf;
	char *pt_regs_buf;
	ulonglong pt_regs_addr;

	pt_regs_buf = GETBUF(SIZE(pt_regs));

	pt_regs_addr = machdep->get_stacktop(target->task) - SIZE(pt_regs);

	/*
	 * The commit 07b047fc2466249aff7cdb23fa0b0955a7a00d48
	 * introduced 8-byte offset to match copy_thread().
	 */
	if (THIS_KERNEL_VERSION >= LINUX(2,6,16))
		pt_regs_addr -= 8;

	readmem(pt_regs_addr, KVADDR, pt_regs_buf, SIZE(pt_regs),
		"genregs_get32: regs", gcore_verbose_error_handle());

	BZERO(regs, sizeof(struct user_regs_struct));

        regs->ax = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_ax));
        regs->bp = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_bp));
        regs->bx = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_bx));
        regs->cs = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_cs));
        regs->cx = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_cx));
        regs->di = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_di));
        regs->ds = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_ds));
        regs->dx = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_dx));
        regs->es = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_es));
        regs->flags = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_flags));
        regs->ip = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_ip));
        regs->orig_ax = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_orig_ax));
        regs->si = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_si));
        regs->sp = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_sp));
        regs->ss = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_ss));

	if (GCORE_VALID_MEMBER(pt_regs_fs))
		regs->fs = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_fs));
	else if (GCORE_VALID_MEMBER(pt_regs_xfs))
		regs->fs = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_xfs));
	if (GCORE_VALID_MEMBER(pt_regs_gs))
		regs->gs = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_gs));
	else if (GCORE_VALID_MEMBER(pt_regs_xgs))
		regs->gs = ULONG(pt_regs_buf + GCORE_OFFSET(pt_regs_xgs));

	regs->ds &= 0xffff;
	regs->es &= 0xffff;
	regs->fs &= 0xffff;
	regs->gs &= 0xffff;
	regs->ss &= 0xffff;

	/*
	 * If LAZY_GS is set, 0 is pushed on gs position at kernel
	 * stack's bottom. Then, gs value we want is at thread->gs,
	 * saved during __switch_to().
	 */
	if (GCORE_VALID_MEMBER(pt_regs_gs) && regs->gs == 0) {
		readmem(target->task + OFFSET(task_struct_thread) +
			GCORE_OFFSET(thread_struct_gs), KVADDR, &regs->gs,
			sizeof(regs->gs), "genregs_get32: regs->gs",
			gcore_verbose_error_handle());

		regs->gs &= 0xffff;

                /*
		 * If gs is handled lazily, it's impossible to restore
		 * gs value for active tasks that had never been
		 * scheduled even once since entering kernel-execution
		 * mode.
		 */
		if (is_task_active(target->task))
			error(WARNING, "maybe cannot restore lazily-handled "
			      "GS for active tasks.\n");
	}

	return TRUE;
}

void gcore_x86_table_init(void)
{
	gcore_x86_table_register_get_thread_struct_fpu();
	gcore_x86_table_register_tsk_used_math();
}

static struct user_regset x86_32_regsets[] = {
	[REGSET_GENERAL] = {
		.core_note_type = NT_PRSTATUS,
		.name = "CORE",
		.get = genregs_get32,
		.size = sizeof(struct user_regs_struct),
	},
	[REGSET_FP] = {
		.core_note_type = NT_FPREGSET,
		.name = "LINUX",
		.size = sizeof(struct user_i387_ia32_struct),
		.active = fpregs_active, .get = fpregs_get,
                .callback = xfpregs_callback,
	},
	[REGSET_XSTATE] = {
		.core_note_type = NT_X86_XSTATE,
		.name = "CORE",
		.active = xstateregs_active, .get = xstateregs_get,
	},
	[REGSET_XFP] = {
		.core_note_type = NT_PRXFPREG,
		.name = "CORE",
		.size = sizeof(struct user32_fxsr_struct),
		.active = xfpregs_active, .get = xfpregs_get,
	},
	[REGSET_TLS] = {
		.core_note_type = NT_386_TLS,
		.name = "CORE",
		.size = GDT_ENTRY_TLS_ENTRIES * sizeof(struct user_desc),
		.active = regset_tls_active,
		.get = regset_tls_get,
	},
	[REGSET_IOPERM32] = {
		.core_note_type = NT_386_IOPERM,
		.name = "CORE",
		.size = IO_BITMAP_BYTES,
		.active = ioperm_active, .get = ioperm_get
	},
};

static const struct user_regset_view x86_32_regset_view = {
	.name = "x86_32",
	.regsets = x86_32_regsets,
	.n = ARRAY_SIZE(x86_32_regsets),
	.e_machine = EM_386,
};

/*
 * The number of registers for REGSET_XSTATE entry is specified
 * dynamically. So, we need to look at it directly.
 */
static void gcore_x86_32_regset_xstate_init(void)
{
	struct user_regset *regset_xstate = &x86_32_regsets[REGSET_XSTATE];

	regset_xstate->size = sizeof(uint32_t) * get_xstate_regsets_number();
}

void gcore_x86_32_regsets_init(void)
{
	gcore_x86_32_regset_xstate_init();
}
#endif

const struct user_regset_view *
task_user_regset_view(void)
{
#ifdef X86_64
	return &x86_64_regset_view;
#elif X86
	return &x86_32_regset_view;
#endif
}

#ifdef GCORE_TEST

#ifdef X86_64
static char *gcore_x86_64_test(void)
{
	int test_rsp, test_fpu, test_syscall, test_math;

	if (gcore_is_rhel4()) {
		test_rsp = gxt->get_old_rsp == gcore_x86_64_get_cpu_pda_oldrsp;
		test_fpu = gxt->get_thread_struct_fpu == gcore_x86_get_thread_struct_i387;
		test_syscall = gxt->is_special_syscall == is_special_syscall_v0;
		test_math = gxt->tsk_used_math == tsk_used_math_v0;
	} else if (gcore_is_rhel5()) {
		test_rsp = gxt->get_old_rsp == gcore_x86_64_get_cpu__pda_oldrsp;
		test_fpu = gxt->get_thread_struct_fpu == gcore_x86_get_thread_struct_i387;
		test_syscall = gxt->is_special_syscall == is_special_syscall_v0;
		test_math = gxt->tsk_used_math == tsk_used_math_v11;
	} else if (gcore_is_rhel6()) {
		test_rsp = gxt->get_old_rsp == gcore_x86_64_get_per_cpu__old_rsp;
		test_fpu = gxt->get_thread_struct_fpu == gcore_x86_get_thread_struct_thread_xstate;
		test_syscall = gxt->is_special_syscall == is_special_syscall_v26;
		test_math = gxt->tsk_used_math == tsk_used_math_v11;
	} else if (THIS_KERNEL_VERSION == LINUX(2,6,36)) {
		test_rsp = gxt->get_old_rsp == gcore_x86_64_get_old_rsp;
		test_fpu = gxt->get_thread_struct_fpu == gcore_x86_get_thread_struct_fpu_thread_xstate;
		test_syscall = gxt->is_special_syscall == is_special_syscall_v26;
		test_math = gxt->tsk_used_math == tsk_used_math_v11;
	}

	mu_assert("gxt->get_old_rsp has wrongly been registered", test_rsp);
	mu_assert("gxt->get_thread_struct_fpu has wrongly been registered", test_fpu);
	mu_assert("gxt->is_special_syscall has wrongly been registered", test_syscall);
	mu_assert("gxt->tsk_used_math has wrongly been registered", test_math);

	return NULL;
}
#endif

#ifdef X86
static char *gcore_x86_32_test(void)
{
	return NULL;
}
#endif

char *gcore_x86_test(void)
{
#ifdef X86_64
	return gcore_x86_64_test();
#else
	return gcore_x86_32_test();
#endif
}

#endif

#endif /* defined(X86) || defined(X86_64) */
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gcore.tar.bz2
Type: application/octet-stream
Size: 28666 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20110118/bb83f11b/attachment.obj>