[Crash-utility] [ANNOUNCE][RFC] gcore extension module: user-mode process core dump

Dave Anderson anderson at redhat.com
Tue Jan 25 14:25:22 UTC 2011



----- Original Message -----
> Hello Dave,
> 
> Thanks for your observations.

> > I'll fix unwind_x86_64.h to prevent this build warning:
> >
> >   # make extensions
> >   ...
> >   gcc -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o
> >   libgcore/gcore_x86.o libgcore/gcore_x86.c
> >   In file included from libgcore/gcore_x86.c:19:
> >   ../unwind_x86_64.h:61:1: warning: "offsetof" redefined
> >   In file included from libgcore/gcore_x86.c:17:
> >   ../defs.h:60:1: warning: this is the location of the previous
> >   definition
> >   ...
> >
> 
> The warning is caused by IO_BITMAP_OFFSET that is defined but unused
> in gcore_x86.c. So, it seems to me that part to be fixed is
> gcore_x86.c, not unwind_x86_64.h.

Maybe, but it should also be fixed in unwind_x86_64.h like this:

  --- unwind_x86_64.h     30 Nov 2010 19:40:30 -0000      1.4
  +++ unwind_x86_64.h     24 Jan 2011 20:54:25 -0000      1.5
  @@ -58,7 +58,9 @@
   extern void init_unwind_table(void);
   extern void free_unwind_table(void);
 
  +#ifndef offsetof
   #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
  +#endif
   #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
   #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
   #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))

Your module is the first C source file that #include's defs.h and then 
unwind_x86_64.h.  The change above to unwind_x86_64.h just does the same
thing as defs.h.

> 
> > But the gcore.mk file should gracefully fail to build on non-supported
> > architectures. It ends up spewing ~200 lines of error messages when
> > attempted, for example, on a ppc64 machine:
> 
> Yes, I know it behaves like this if we make it run on unsupported
> architectures. I'd understood it was implicitly permitted by looking
> at similar build error of sial. But if it's wrong in fact, I'll make
> it buildable on unsupported architectures.

Or you could just catch it in the gcore.mk by doing something like this:

  ARCH=UNSUPPORTED
  ifeq ($(shell arch), x86_64)
    ARCH=SUPPORTED
  endif
  ifeq ($(shell arch), i686)
    ARCH=SUPPORTED
  endif

  all: gcore.so

  gcore.so: gcore.c
          @if [ ${ARCH} = "UNSUPPORTED"  ]; then \
                  echo "gcore: architecture not supported"; else \
          echo "do build here..."; fi;

> 
> gcore includes part that can be shared commonly among different
> architectures. This is mostly equal to anything but part of collecting
> kinds of note information that are inherently architecture speciffic.
> 
> I'll fix here so that gcore on unsupported architectures are providing
> ELF core only with PT_LOAD sections.
> 
> >
> > Your documentation implies that the command would only work on
> > certain kernel versions:
> >
> >> Compared with the previous version, this release:
> >> - supports more kernel versions, and
> >> - collects register values more accurately (but still not perfect).
> >>
> >> Support Range
> >> =============
> >>
> >> |----------------+----------------------------------------------|
> >> | ARCH | X86, X86_64 |
> >> |----------------+----------------------------------------------|
> >> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 |
> >> |----------------+----------------------------------------------|
> >
> >
> > But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported),
> > it seems to work OK on some tasks, but on others it doesn't work so well.
> > Here, the "less" command can be dumped OK kernel:
> >
> >
> >   crash> sys | grep RELEASE
> >        RELEASE: 2.6.34-2.fc14.x86_64
> >   crash> ps
> >   ... [ cut ] ...
> >   >  2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> >      2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> >      2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> >   crash> gcore -v0 2090
> >   Saved core.2090.less
> >   crash>
> >
> > But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle
> > dumping the crash utility itself, and just hangs:
> >
> >   crash> swap
> >   FILENAME TYPE SIZE USED PCT PRIORITY
> >   /dev/dm-1 PARTITION 18579452k 0k 0% -1
> >   crash> ps
> >   ... [ cut ] ...
> >   >  2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash
> >      2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd
> >      2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less
> >   crash> gcore -v1 2080
> >   gcore: Restoring the thread group ...
> >   gcore: done.
> >   gcore: Retrieving note information ...
> >
> >   < hangs forever >
> >
> >   ...
> >
> > I would have thought that it would either work-for-all or work-for-none
> > with respect to a particular kernel version?
> 
> Sorry, I have no idea on what you mean by ``work-for-all or work-for-none''.
> ``supported kernel versions'' stands for ``I tested gcore
> extension module on these kernels''. There's possibility for gcore to
> work well even on differnet kernel versions if there's no
> incompatibility among the kernel versions.

But the "less" and "crash" command examples were from the same dumpfile,
so I didn't understand whey gcore would work for one command, but not for
another command -- from the same kernel version?  
 
> >
> > In any case, if it's going to fail, perhaps there should be some mechanism
> > in place that would prevent it from hanging, and instead print a  message
> > that the kernel version is not supported? Or if a particular data structure
> > is different than the "supported" versions, it should fail immediately?
> > Just a thought...
> 
> I agree to the former idea. I believe gcore has an enough chanse to
> work well on unsupported kernels.
> 
> The hanging part is likely to be restore_frame_pointer() that runs
> only when the analized kernel is built with CONFIG_FRAME_POINTER=y and
> user-space frame pointer is available by looking at the base pointer
> in order.
> 
> If kernel stack frame is in mess condition, unwinding behaviour by the
> function can be performed in any unexpected way.
> 
> I'll fix here by adding some degree that limits the number of tracing
> to some finite number. Kernel stack size would be enough here.
> 
> >
> > Also I note that "gcore -v7" fails -- shouldn't it be accepted as an
> > argument?
> >
> >   crash> gcore -v7 2080
> >   gcore: invalid vlevel: 7.
> >   crash>
> 
> Oh, sorry. This is just a bug that should be removed my unit testing.
> Thanks.
> 
> I'll post again fixed version soon. Please wait for a while.

OK thanks,
  Dave




More information about the Crash-utility mailing list