[Crash-utility] [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Dave Anderson
anderson at redhat.com
Tue Jan 25 14:25:22 UTC 2011
----- Original Message -----
> Hello Dave,
>
> Thanks for your observations.
> > I'll fix unwind_x86_64.h to prevent this build warning:
> >
> > # make extensions
> > ...
> > gcc -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o
> > libgcore/gcore_x86.o libgcore/gcore_x86.c
> > In file included from libgcore/gcore_x86.c:19:
> > ../unwind_x86_64.h:61:1: warning: "offsetof" redefined
> > In file included from libgcore/gcore_x86.c:17:
> > ../defs.h:60:1: warning: this is the location of the previous
> > definition
> > ...
> >
>
> The warning is caused by IO_BITMAP_OFFSET that is defined but unused
> in gcore_x86.c. So, it seems to me that part to be fixed is
> gcore_x86.c, not unwind_x86_64.h.
Maybe, but it should also be fixed in unwind_x86_64.h like this:
--- unwind_x86_64.h 30 Nov 2010 19:40:30 -0000 1.4
+++ unwind_x86_64.h 24 Jan 2011 20:54:25 -0000 1.5
@@ -58,7 +58,9 @@
extern void init_unwind_table(void);
extern void free_unwind_table(void);
+#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
+#endif
#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
Your module is the first C source file that #include's defs.h and then
unwind_x86_64.h. The change above to unwind_x86_64.h just does the same
thing as defs.h.
>
> > But the gcore.mk file should gracefully fail to build on non-supported
> > architectures. It ends up spewing ~200 lines of error messages when
> > attempted, for example, on a ppc64 machine:
>
> Yes, I know it behaves like this if we make it run on unsupported
> architectures. I'd understood it was implicitly permitted by looking
> at similar build error of sial. But if it's wrong in fact, I'll make
> it buildable on unsupported architectures.
Or you could just catch it in the gcore.mk by doing something like this:
ARCH=UNSUPPORTED
ifeq ($(shell arch), x86_64)
ARCH=SUPPORTED
endif
ifeq ($(shell arch), i686)
ARCH=SUPPORTED
endif
all: gcore.so
gcore.so: gcore.c
@if [ ${ARCH} = "UNSUPPORTED" ]; then \
echo "gcore: architecture not supported"; else \
echo "do build here..."; fi;
>
> gcore includes part that can be shared commonly among different
> architectures. This is mostly equal to anything but part of collecting
> kinds of note information that are inherently architecture speciffic.
>
> I'll fix here so that gcore on unsupported architectures are providing
> ELF core only with PT_LOAD sections.
>
> >
> > Your documentation implies that the command would only work on
> > certain kernel versions:
> >
> >> Compared with the previous version, this release:
> >> - supports more kernel versions, and
> >> - collects register values more accurately (but still not perfect).
> >>
> >> Support Range
> >> =============
> >>
> >> |----------------+----------------------------------------------|
> >> | ARCH | X86, X86_64 |
> >> |----------------+----------------------------------------------|
> >> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
> >> |----------------+----------------------------------------------|
> >
> >
> > But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported),
> > it seems to work OK on some tasks, but on others it doesn't work so well.
> > Here, the "less" command can be dumped OK kernel:
> >
> >
> > crash> sys | grep RELEASE
> > RELEASE: 2.6.34-2.fc14.x86_64
> > crash> ps
> > ... [ cut ] ...
> > > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> > 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> > 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> > crash> gcore -v0 2090
> > Saved core.2090.less
> > crash>
> >
> > But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle
> > dumping the crash utility itself, and just hangs:
> >
> > crash> swap
> > FILENAME TYPE SIZE USED PCT PRIORITY
> > /dev/dm-1 PARTITION 18579452k 0k 0% -1
> > crash> ps
> > ... [ cut ] ...
> > > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> > 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> > 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> > crash> gcore -v1 2080
> > gcore: Restoring the thread group ...
> > gcore: done.
> > gcore: Retrieving note information ...
> >
> > < hangs forever >
> >
> > ...
> >
> > I would have thought that it would either work-for-all or work-for-none
> > with respect to a particular kernel version?
>
> Sorry, I have no idea on what you mean by ``work-for-all or work-for-none''.
> ``supported kernel versions'' stands for ``I tested gcore
> extension module on these kernels''. There's possibility for gcore to
> work well even on differnet kernel versions if there's no
> incompatibility among the kernel versions.
But the "less" and "crash" command examples were from the same dumpfile,
so I didn't understand whey gcore would work for one command, but not for
another command -- from the same kernel version?
> >
> > In any case, if it's going to fail, perhaps there should be some mechanism
> > in place that would prevent it from hanging, and instead print a message
> > that the kernel version is not supported? Or if a particular data structure
> > is different than the "supported" versions, it should fail immediately?
> > Just a thought...
>
> I agree to the former idea. I believe gcore has an enough chanse to
> work well on unsupported kernels.
>
> The hanging part is likely to be restore_frame_pointer() that runs
> only when the analized kernel is built with CONFIG_FRAME_POINTER=y and
> user-space frame pointer is available by looking at the base pointer
> in order.
>
> If kernel stack frame is in mess condition, unwinding behaviour by the
> function can be performed in any unexpected way.
>
> I'll fix here by adding some degree that limits the number of tracing
> to some finite number. Kernel stack size would be enough here.
>
> >
> > Also I note that "gcore -v7" fails -- shouldn't it be accepted as an
> > argument?
> >
> > crash> gcore -v7 2080
> > gcore: invalid vlevel: 7.
> > crash>
>
> Oh, sorry. This is just a bug that should be removed my unit testing.
> Thanks.
>
> I'll post again fixed version soon. Please wait for a while.
OK thanks,
Dave
More information about the Crash-utility
mailing list