capturing a core file from a non-privileged daemon
Howard, Chris
HowardC at prpa.org
Thu Jul 15 21:07:40 UTC 2010
I wonder if it is trying to right the core file to the
root directory and failing?
> -----Original Message-----
> From: Tim Mooney [mailto:Tim.Mooney at ndsu.edu]
> Sent: Thursday, July 15, 2010 2:42 PM
> To: redhat-sysadmin-list at redhat.com
> Subject: capturing a core file from a non-privileged daemon
>
>
> All-
>
> I have a system where a daemon (started as root) periodically forks
> non-privileged workers, and those workers sometimes try to dump core.
> I would like to capture those core files for debugging, but so far I
> have been unable to find a way for the daemon to actually generate the
> core file. I'm hoping someone can point out what I'm missing.
>
> The system is RHEL 4.8, x86_64, currently running 2.6.9-89.0.23.ELsmp.
> Running "dmesg", I see dozens of these per day:
>
> #dmesg
> lpd[12642]: segfault at 000000000000000c rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
> lpd[21006]: segfault at 000000000000000c rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
> lpd[16944]: segfault at 0000000036383675 rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
> lpd[19501]: segfault at 0000000036383675 rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
> lpd[11300]: segfault at 000000000000000c rip 0000000000a73cce rsp
> 00000000ffffcea0 error 4
>
>
> The daemon is a slightly older version of the LPRng lpd. Its model is
> to
> start as root but switch to a non-privileged user (lp) and then fork
> workers
> as needed for queue processing. The daemon is locally-compiled and is
> not stripped.
>
> It's being started with the following line in /etc/init.d/lpd:
>
> daemon /usr/local/sbin/lpd
>
> Because the daemon shell function defaults to setting "ulimit -c 0",
> I've
> added the following two lines to the startup script, to override that
> default behavior:
>
> DAEMON_COREFILE_LIMIT=unlimited
> export DAEMON_COREFILE_LIMIT
>
> If I check the /proc/<pid>/limits file for both the master lpd process
> or any of the worker processes, I can see that the core file limit is
> "unlimited":
>
> $ ps -ef | grep -i lpd
> lp 16689 1 1 Jul11 ? 01:17:39 lpd Waiting
> lp 16690 16689 0 Jul11 ? 00:04:58 lpd LOG2
>
> #cat /proc/16689/limits
> Limit Soft Limit Hard Limit
> Units
> Max cpu time unlimited unlimited
> seconds
> Max file size unlimited unlimited
> bytes
> Max data size unlimited unlimited
> bytes
> Max stack size 10485760 unlimited
> bytes
> Max core file size unlimited unlimited
> bytes
> Max resident set unlimited unlimited
> bytes
> Max processes 16383 16383
> processes
> Max open files 1024 1024
> files
> Max locked memory 32768 32768
> bytes
> Max address space unlimited unlimited
> bytes
> Max file locks unlimited unlimited
> locks
> Max pending signals 1024 1024
> signals
> Max msgqueue size 819200 819200
> bytes
>
> #cat /proc/16690/limits
> Limit Soft Limit Hard Limit
> Units
> Max cpu time unlimited unlimited
> seconds
> Max file size unlimited unlimited
> bytes
> Max data size unlimited unlimited
> bytes
> Max stack size 10485760 unlimited
> bytes
> Max core file size unlimited unlimited
> bytes
> Max resident set unlimited unlimited
> bytes
> Max processes 16383 16383
> processes
> Max open files 1024 1024
> files
> Max locked memory 32768 32768
> bytes
> Max address space unlimited unlimited
> bytes
> Max file locks unlimited unlimited
> locks
> Max pending signals 1024 1024
> signals
> Max msgqueue size 819200 819200
> bytes
>
>
> So, it doesn't appear that it's a problem with "ulimit"...
>
> Because the worker processes are non-privileged and the main daemon
> process has / as its CWD, it's potentially a permissions problem. To
> get around that, I set kernel.core_pattern so that core files would go
> into /tmp:
>
> #sysctl -a | egrep -i 'kernel.core'
> kernel.core_pattern = /tmp/core.%p.%e.%s.%t
> kernel.core_uses_pid = 1
>
>
> After doing that, still no joy. After some web searching, I was
> even desperate enough to try setting "kernel.suid_dumpable" parameter
> mentioned here:
>
> http://wiki.zimbra.com/index.php?title=Enabling_Core_Files
>
> even though the "lpd" process is not setuid, it just starts as root.
> That too made no difference.
>
> On the off chance that the kernel.core_pattern wasn't being honored, I
> even went so far as to briefly try changing ownership (to "lp") and
> permissions (775) on /, to give the daemon permission to dump core in
> /.
> That also made no difference, so it's been undone.
>
> I've also tried pursuing using "systemtap" to install a segfault probe
> that just watches for segfaults from processes named "lpd", and that
> works
> but unfortunately systemtap on RHEL4 cannot do user-level tracing,
> which
> is what I need.
>
> Anyone have any ideas on what I've missed? To be able to debug what's
> going on with the worker daemons, I really need to get my hands on
some
> of
> the core files. I'm comfortable with both gdb and strace/ltrace, but
> if
> at all possible I want to avoid attaching to the main daemon and just
> using one of those tools to gather a huge volume of data just waiting
> for
> one of the forked children to segfault. Capturing a core file would
be
> a much better way to start the debugging process.
>
> Thanks,
>
> Tim
> --
> Tim Mooney
> Tim.Mooney at ndsu.edu
> Enterprise Computing & Infrastructure 701-231-1076
> (Voice)
> Room 242-J6, IACC Building 701-231-8541
> (Fax)
> North Dakota State University, Fargo, ND 58105-5164
>
> --
> redhat-sysadmin-list mailing list
> redhat-sysadmin-list at redhat.com
> https://www.redhat.com/mailman/listinfo/redhat-sysadmin-list
More information about the redhat-sysadmin-list
mailing list