[Crash-utility] User Stack back trace of the process

Wed Sep 5 13:28:18 UTC 2007

Rajesh wrote:
> Dave,
> 
> Thanks for your explanation.
> 
> Well the reason behind my questions is, we have an application running 
> on customer site and the application consumes around 60GB of system memory.
> When this process receives the segmentation fault or signal abort, the 
> kernel will start to take the process core dump. Here is the problem. 
> Kernel takes at least  1hr (60-minutes) to come out from core dump. 
> During this time the system is unresponsive (hung), and I feel it is 
> because the system is entering into thrashing due to huge memory usage 
> by the process. This long down time is not acceptable by the customer.
> 
> So I started to find the better way or tackling the problem.
> 
> 1>First thing we thought is changing the system page size from 4KB to 
> 8KB. Since this change could not be done on our x86_64 architecture, 
> since x86_64 architecture doesn’t support multi-page size option.
> 
> 2>We wrote a program using libbfd API’s and used with in our 
> application. Whenever the SIGSEGV or SIGABRT is received by the process 
> it will log the stack trace of all the threads within that process. This 
> feature is not so effective or flexible as compared to process core dump.
> 
> 3>Last we thought of using kcore/vmcore to analyze the cause for SIGSEGV 
> or SIGABRT.
> 
> 4>I have one more thought, making the “elf_core_dump()” function SMP. 
> This function is responsible for dumping the core, and the function is 
> present in “/usr/src/linux/fs/binfmt_elf.c”
> 
> 
> Any comments/ideas are welcome.
> 
> --Regards,
> rajesh

Maybe tinker with maydump()?

If you know that the core dump contains the VMA's that are
not necessary to dump, such as large shared memory segments,
and you can identify them from the VMA, you can prevent
them from being copied to the core dump.  There's this
patch floating around, which may have been updated:

  http://lkml.org/lkml/2007/2/16/149

Dave