[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: capturing a core file from a non-privileged daemon



In regard to: RE: capturing a core file from a non-privileged daemon,...:

I wonder if it is trying to right the core file to the
root directory and failing?

That's why I tried both setting

	 kernel.core_pattern = /tmp/core.%p.%e.%s.%t

and actually chown/chmod on / so that it was writable by the lp user.
Neither change made any difference.

Tim

-----Original Message-----
From: Tim Mooney [mailto:Tim Mooney ndsu edu]
Sent: Thursday, July 15, 2010 2:42 PM
To: redhat-sysadmin-list redhat com
Subject: capturing a core file from a non-privileged daemon


All-

I have a system where a daemon (started as root) periodically forks
non-privileged workers, and those workers sometimes try to dump core.
I would like to capture those core files for debugging, but so far I
have been unable to find a way for the daemon to actually generate the
core file.  I'm hoping someone can point out what I'm missing.

The system is RHEL 4.8, x86_64, currently running 2.6.9-89.0.23.ELsmp.
Running "dmesg", I see dozens of these per day:

#dmesg
lpd[12642]: segfault at 000000000000000c rip 0000000000a73cce rsp
00000000ffffcea0 error 4
lpd[21006]: segfault at 000000000000000c rip 0000000000a73cce rsp
00000000ffffcea0 error 4
lpd[16944]: segfault at 0000000036383675 rip 0000000000a73cce rsp
00000000ffffcea0 error 4
lpd[19501]: segfault at 0000000036383675 rip 0000000000a73cce rsp
00000000ffffcea0 error 4
lpd[11300]: segfault at 000000000000000c rip 0000000000a73cce rsp
00000000ffffcea0 error 4


The daemon is a slightly older version of the LPRng lpd.  Its model is
to
start as root but switch to a non-privileged user (lp) and then fork
workers
as needed for queue processing.  The daemon is locally-compiled and is
not stripped.

It's being started with the following line in /etc/init.d/lpd:

 	daemon /usr/local/sbin/lpd

Because the daemon shell function defaults to setting "ulimit -c 0",
I've
added the following two lines to the startup script, to override that
default behavior:

DAEMON_COREFILE_LIMIT=unlimited
export DAEMON_COREFILE_LIMIT

If I check the /proc/<pid>/limits file for both the master lpd process
or any of the worker processes, I can see that the core file limit is
"unlimited":

$ ps -ef | grep -i lpd
lp       16689     1  1 Jul11 ?        01:17:39 lpd Waiting
lp       16690 16689  0 Jul11 ?        00:04:58 lpd LOG2

#cat /proc/16689/limits
Limit                     Soft Limit           Hard Limit
Units
Max cpu time              unlimited            unlimited
seconds
Max file size             unlimited            unlimited
bytes
Max data size             unlimited            unlimited
bytes
Max stack size            10485760             unlimited
bytes
Max core file size        unlimited            unlimited
bytes
Max resident set          unlimited            unlimited
bytes
Max processes             16383                16383
processes
Max open files            1024                 1024
files
Max locked memory         32768                32768
bytes
Max address space         unlimited            unlimited
bytes
Max file locks            unlimited            unlimited
locks
Max pending signals       1024                 1024
signals
Max msgqueue size         819200               819200
bytes

#cat /proc/16690/limits
Limit                     Soft Limit           Hard Limit
Units
Max cpu time              unlimited            unlimited
seconds
Max file size             unlimited            unlimited
bytes
Max data size             unlimited            unlimited
bytes
Max stack size            10485760             unlimited
bytes
Max core file size        unlimited            unlimited
bytes
Max resident set          unlimited            unlimited
bytes
Max processes             16383                16383
processes
Max open files            1024                 1024
files
Max locked memory         32768                32768
bytes
Max address space         unlimited            unlimited
bytes
Max file locks            unlimited            unlimited
locks
Max pending signals       1024                 1024
signals
Max msgqueue size         819200               819200
bytes


So, it doesn't appear that it's a problem with "ulimit"...

Because the worker processes are non-privileged and the main daemon
process has / as its CWD, it's potentially a permissions problem.  To
get around that, I set kernel.core_pattern so that core files would go
into /tmp:

#sysctl -a | egrep -i 'kernel.core'
kernel.core_pattern = /tmp/core.%p.%e.%s.%t
kernel.core_uses_pid = 1


After doing that, still no joy.  After some web searching, I was
even desperate enough to try setting "kernel.suid_dumpable" parameter
mentioned here:

 	http://wiki.zimbra.com/index.php?title=Enabling_Core_Files

even though the "lpd" process is not setuid, it just starts as root.
That too made no difference.

On the off chance that the kernel.core_pattern wasn't being honored, I
even went so far as to briefly try changing ownership (to "lp") and
permissions (775) on /, to give the daemon permission to dump core in
/.
That also made no difference, so it's been undone.

I've also tried pursuing using "systemtap" to install a segfault probe
that just watches for segfaults from processes named "lpd", and that
works
but unfortunately systemtap on RHEL4 cannot do user-level tracing,
which
is what I need.

Anyone have any ideas on what I've missed?  To be able to debug what's
going on with the worker daemons, I really need to get my hands on
some
of
the core files.  I'm comfortable with both gdb and strace/ltrace, but
if
at all possible I want to avoid attaching to the main daemon and just
using one of those tools to gather a huge volume of data just waiting
for
one of the forked children to segfault.  Capturing a core file would
be
a much better way to start the debugging process.

Thanks,

Tim
--
Tim Mooney
Tim Mooney ndsu edu
Enterprise Computing & Infrastructure                  701-231-1076
(Voice)
Room 242-J6, IACC Building                             701-231-8541
(Fax)
North Dakota State University, Fargo, ND 58105-5164

--
redhat-sysadmin-list mailing list
redhat-sysadmin-list redhat com
https://www.redhat.com/mailman/listinfo/redhat-sysadmin-list

--
redhat-sysadmin-list mailing list
redhat-sysadmin-list redhat com
https://www.redhat.com/mailman/listinfo/redhat-sysadmin-list


--
Tim Mooney                                             Tim Mooney ndsu edu
Enterprise Computing & Infrastructure                  701-231-1076 (Voice)
Room 242-J6, IACC Building                             701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]