[Crash-utility] [ANNOUNCE][RFC] gcore extension module: user-mode process core dump

HATAYAMA Daisuke d.hatayama at jp.fujitsu.com
Tue Jan 25 01:36:39 UTC 2011


Hello Dave,

Thanks for your observations.

From: Dave Anderson <anderson at redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Mon, 24 Jan 2011 14:27:39 -0500 (EST)

> 
> 
> ----- Original Message -----
>> gcore extension module provides a means to create ELF core dump for
>> user-mode process that is contained within crash kernel dump. I design
>> this to behave as kernel's ELF core dumper.
>> 
>> For previous discussion, see:
>> https://www.redhat.com/archives/crash-utility/2010-August/msg00001.html
> 
> A few observations...
> 
> I'll fix unwind_x86_64.h to prevent this build warning:
>   
>   # make extensions
>   ...
>   gcc  -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o libgcore/gcore_x86.o libgcore/gcore_x86.c
>   In file included from libgcore/gcore_x86.c:19:
>   ../unwind_x86_64.h:61:1: warning: "offsetof" redefined
>   In file included from libgcore/gcore_x86.c:17:
>   ../defs.h:60:1: warning: this is the location of the previous definition
>   ...
> 

The warning is caused by IO_BITMAP_OFFSET that is defined but unused
in gcore_x86.c.  So, it seems to me that part to be fixed is
gcore_x86.c, not unwind_x86_64.h.

> But the gcore.mk file should gracefully fail to build on non-supported
> architectures.  It ends up spewing ~200 lines of error messages when
> attempted, for example, on a ppc64 machine:

Yes, I know it behaves like this if we make it run on unsupported
architectures. I'd understood it was implicitly permitted by looking
at similar build error of sial. But if it's wrong in fact, I'll make
it buildable on unsupported architectures.

gcore includes part that can be shared commonly among different
architectures. This is mostly equal to anything but part of collecting
kinds of note information that are inherently architecture speciffic.

I'll fix here so that gcore on unsupported architectures are providing
ELF core only with PT_LOAD sections.

> 
> Your documentation implies that the command would only work on 
> certain kernel versions:
> 
>> Compared with the previous version, this release:
>> - supports more kernel versions, and
>> - collects register values more accurately (but still not perfect).
>> 
>> Support Range
>> =============
>> 
>> |----------------+----------------------------------------------|
>> | ARCH | X86, X86_64 |
>> |----------------+----------------------------------------------|
>> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
>> |----------------+----------------------------------------------|
> 
> 
> But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported),
> it seems to work OK on some tasks, but on others it doesn't work so well.
> Here, the "less" command can be dumped OK kernel:
> 
> 
>   crash> sys | grep RELEASE
>        RELEASE: 2.6.34-2.fc14.x86_64
>   crash> ps
>   ... [ cut ] ...
>   >  2080   1490   0  ffff880079ed2480  RU   7.6  289900 159684  crash
>      2084      1   0  ffff880077a7a480  IN   0.1  248592   1936  rsyslogd
>      2090   2080   5  ffff880079ed4900  IN   0.0  105432    828  less
>   crash> gcore -v0 2090
>   Saved core.2090.less
>   crash>
> 
> But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle 
> dumping the crash utility itself, and just hangs:
> 
>   crash> swap
>   FILENAME           TYPE         SIZE      USED   PCT  PRIORITY
>   /dev/dm-1        PARTITION    18579452k       0k   0%     -1
>   crash> ps
>   ... [ cut ] ...
>   >  2080   1490   0  ffff880079ed2480  RU   7.6  289900 159684  crash
>      2084      1   0  ffff880077a7a480  IN   0.1  248592   1936  rsyslogd
>      2090   2080   5  ffff880079ed4900  IN   0.0  105432    828  less
>   crash> gcore -v1 2080
>   gcore: Restoring the thread group ... 
>   gcore: done.
>   gcore: Retrieving note information ... 
>   
>   < hangs forever >
> 
>   ...
> 
> I would have thought that it would either work-for-all or work-for-none
> with respect to a particular kernel version?

Sorry, I have no idea on what you mean by ``work-for-all or
work-for-none''.

``supported kernel versions'' stands for ``I tested gcore
extension module on these kernels''. There's possibility for gcore to
work well even on differnet kernel versions if there's no
incompatibility among the kernel versions.

> 
> In any case, if it's going to fail, perhaps there should be some mechanism
> in place that would prevent it from hanging, and instead print a message 
> that the kernel version is not supported?  Or if a particular data structure
> is different than the "supported" versions, it should fail immediately?  
> Just a thought...

I agree to the former idea. I believe gcore has an enough chanse to
work well on unsupported kernels.

The hanging part is likely to be restore_frame_pointer() that runs
only when the analized kernel is built with CONFIG_FRAME_POINTER=y and
user-space frame pointer is available by looking at the base pointer
in order.

If kernel stack frame is in mess condition, unwinding behaviour by the
function can be performed in any unexpected way.

I'll fix here by adding some degree that limits the number of tracing
to some finite number. Kernel stack size would be enough here.

> 
> Also I note that "gcore -v7" fails -- shouldn't it be accepted as an argument?
> 
>   crash> gcore -v7 2080
>   gcore: invalid vlevel: 7.
>   crash>

Oh, sorry. This is just a bug that should be removed my unit testing. Thanks.

I'll post again fixed version soon. Please wait for a while.

Thanks.
HATAYAMA Daisuke




More information about the Crash-utility mailing list