[Crash-utility] [RFC] gcore subcommand: a process coredump feature

HATAYAMA Daisuke d.hatayama at jp.fujitsu.com
Tue Aug 3 05:18:20 UTC 2010


Hello Dave,

Thank you for your comment.

From: anderson at prospeed.net
Subject: Re: [Crash-utility] [RFC] gcore subcommand: a process coredump feature
Date: Mon, 2 Aug 2010 19:02:23 -0400 (EDT)

>>
>> Hello,
>>
>> For some weeks I've developed gcore subcommand for crash utility which
>> provides process coredump feature for crash kernel dump, strongly
>> demanded by users who want to investigate user-space applications
>> contained in kernel crash dump.
>>
>> I've now finished making a prototype version of gcore and found out
>> what are the issues to be addressed intensely. Could you give me any
>> comments and suggestions on this work?
> 
> Hello Daisuke,
> 
> As I mentioned in my previous email re: cpu numbering, I am currently
> on vacation, and cannot spend much time looking at this issue until
> I get back on August 9th.
> 
> However, I think that this could be a useful feature, and I did
> take a quick look at how it could be done several months ago when
> it was brought up on this mailing list.  However, as you discovered,

I hear for the first time that the same kind of proposal was already
proposed previously on this mailing list. I try to find it to compare
with mine.

> I also noted that the user-space core dump code in the kernel has
> undergone significant changes over time, and so the implemetation
> by the crash utility would have to adapt to the kernel data structures
> used by the various kernel versions.  And because of that, I don't
> want to put it into the base crash binary, but rather it should be
> maintained as one or more extension modules, which can be located
> in the "extensions" subdirectory in the crash source package, as well
> as stored in the "extensions" web page link from the crash "people"
> web site.

I agree basically, but I think a main stream of gcore can steadily be
shared among the ones for different kernel versions, since dependent
kernel data structures around there are mm and mmap members of
task_struct and vm_* members of mm_struct only. So, I think it
possible to keep the main stream in binary and make only
kernel-version specific sub-programs to gather kinds of note
information be distributed as shared libraries.

> 
> It is quite simple to re-adapt your patch as an extension module.
> Check the "snap.c" and "snap.mk" files in the extensions subdirectory
> as templates for your "gcore" command.
> 
> As to the other questions below, I will get back to you after
> August 9th.

Thanks, I'm waiting for your further comments.

> 
> Thanks,
>   Dave
> 
> 
>> Motivation
>> ==========
>>
>> It's a relatively familiar technique that in a cluster system a
>> currently running node triggers crash kernel dump mechanism when
>> detecting a kind of a critical error in order for the running, error
>> detecting server to cease as soon as possible. Concequently, the
>> residual crash kernel dump contains a process image for the erroneous
>> user application. At the case, developpers are interested in user
>> space, rather than kernel space.
>>
>> There's also a merit of gcore that it allows us to use several
>> userland debugging tools, such as GDB and binutils, in order to
>> analyze user space memory.
>>
>>
>> Current Status
>> ==============
>>
>> I confirm the prototype version runs on the following configuration:
>>
>>   Linux Kernel Version: 2.6.34
>>   Supporting Architecture: x86_64
>>   Crash Version: 5.0.5
>>   Dump Format: ELF
>>
>> I'm planning to widen a range of support as follows:
>>
>>   Linux Kernel Version: Any
>>   Supporting Architecture: i386, x86_64 and IA64
>>   Dump Format: Any
>>
>>
>> Issues
>> ======
>>
>> Currently, I have issues below.
>>
>> 1) Retrieval of appropriate register values
>>
>> The prototype version retrieves register values from a _wrong_
>> location: a top of the kernel stack, into which register values are
>> saved at any preemption context switch. On the other hand, the
>> register values that should be included here are the ones saved at
>> user-to-kernel context switch on any interrupt event.
>>
>> I've yet to implement this. Specifically, I need to do the following
>> task from now.
>>
>>   (1) list all entries from user-space to kernel-space execution path.
>>
>>   (2) divide the entries according to where and how the register
>>   values from user-space context are saved.
>>
>>   (3) compose a program that retrieves the saved register values from
>>   appropriate locations that is traced by means of (1) and (2).
>>
>> Ideally, I think it's best if crash library provides any means of
>> retrieving this kind of register values, that is, ones saved on
>> various stack frames. Is there such a plan to do?
>>
>>
>> 2) Getting a signal number for a task which was during core dump
>> process at kernel crash
>>
>> If a target task is halfway of core dump process, it's better to know
>> a signal number in order to know why the task was about to be core
>> dumped.
>>
>> Unfortunately, I have no choice but backtrace the kernel stack to
>> retrieve a signal number saved there as an argument of, for example,
>> do_coredump().
>>
>>
>> 3) Kernel version compatibility
>>
>> crash's policy is to support all kernel versions by the latest crash
>> package. On the other hand, the prototype is based on kernel 2.6.34.
>> This means more kernel versions need to be supported.
>>
>> Well, the question is: to what versions do I need to really test in
>> addition to the latest upstream kernel? I think it's practically
>> enough to support RHEL4, RHEL5 and RHEL6.
>>
>>
>> Build Instruction
>> =================
>>
>>   $ tar xf crash-5.0.5.tar.gz
>>   $ cd crash-5.0.5/
>>   $ patch -p 1 < gcore.patch
>>   $ make
>>
>>
>> Usage
>> =====
>>
>> Use help subcommand of crash utility as ``help gcore''.
>>
>>
>> Attached File
>> =============
>>
>>   * gcore.patch
>>
>>     A patch implementing gcore subcommand for crash-5.0.5.
>>
>>     The diffstat output is as follows.
>>
>> $ diffstat gcore.patch
>>  Makefile      |   10 +-
>>  defs.h        |   15 +
>>  gcore.c       | 1858
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  gcore.h       |  639 ++++++++++++++++++++
>>  global_data.c |    3 +
>>  help.c        |   28 +
>>  netdump.c     |   27 +
>>  tools.c       |   37 ++
>>  8 files changed, 2615 insertions(+), 2 deletions(-)
>>
>> --
>> HATAYAMA Daisuke
>> d.hatayama at jp.fujitsu.com
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: gcore.patch
>> Type: text/x-patch
>> Size: 78046 bytes
>> Desc: not available
>> URL:
>> <https://www.redhat.com/archives/crash-utility/attachments/20100802/710541de/attachment.bin>
>>
>> ------------------------------
> 
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 




More information about the Crash-utility mailing list