capturing a core file from a non-privileged daemon

Tim Mooney Tim.Mooney at ndsu.edu
Thu Jul 15 21:21:07 UTC 2010


In regard to: RE: capturing a core file from a non-privileged daemon,...:

> I wonder if it is trying to right the core file to the
> root directory and failing?

That's why I tried both setting

 	 kernel.core_pattern = /tmp/core.%p.%e.%s.%t

and actually chown/chmod on / so that it was writable by the lp user.
Neither change made any difference.

Tim

>> -----Original Message-----
>> From: Tim Mooney [mailto:Tim.Mooney at ndsu.edu]
>> Sent: Thursday, July 15, 2010 2:42 PM
>> To: redhat-sysadmin-list at redhat.com
>> Subject: capturing a core file from a non-privileged daemon
>>
>>
>> All-
>>
>> I have a system where a daemon (started as root) periodically forks
>> non-privileged workers, and those workers sometimes try to dump core.
>> I would like to capture those core files for debugging, but so far I
>> have been unable to find a way for the daemon to actually generate the
>> core file.  I'm hoping someone can point out what I'm missing.
>>
>> The system is RHEL 4.8, x86_64, currently running 2.6.9-89.0.23.ELsmp.
>> Running "dmesg", I see dozens of these per day:
>>
>> #dmesg
>> lpd[12642]: segfault at 000000000000000c rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>> lpd[21006]: segfault at 000000000000000c rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>> lpd[16944]: segfault at 0000000036383675 rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>> lpd[19501]: segfault at 0000000036383675 rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>> lpd[11300]: segfault at 000000000000000c rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>>
>>
>> The daemon is a slightly older version of the LPRng lpd.  Its model is
>> to
>> start as root but switch to a non-privileged user (lp) and then fork
>> workers
>> as needed for queue processing.  The daemon is locally-compiled and is
>> not stripped.
>>
>> It's being started with the following line in /etc/init.d/lpd:
>>
>>  	daemon /usr/local/sbin/lpd
>>
>> Because the daemon shell function defaults to setting "ulimit -c 0",
>> I've
>> added the following two lines to the startup script, to override that
>> default behavior:
>>
>> DAEMON_COREFILE_LIMIT=unlimited
>> export DAEMON_COREFILE_LIMIT
>>
>> If I check the /proc/<pid>/limits file for both the master lpd process
>> or any of the worker processes, I can see that the core file limit is
>> "unlimited":
>>
>> $ ps -ef | grep -i lpd
>> lp       16689     1  1 Jul11 ?        01:17:39 lpd Waiting
>> lp       16690 16689  0 Jul11 ?        00:04:58 lpd LOG2
>>
>> #cat /proc/16689/limits
>> Limit                     Soft Limit           Hard Limit
>> Units
>> Max cpu time              unlimited            unlimited
>> seconds
>> Max file size             unlimited            unlimited
>> bytes
>> Max data size             unlimited            unlimited
>> bytes
>> Max stack size            10485760             unlimited
>> bytes
>> Max core file size        unlimited            unlimited
>> bytes
>> Max resident set          unlimited            unlimited
>> bytes
>> Max processes             16383                16383
>> processes
>> Max open files            1024                 1024
>> files
>> Max locked memory         32768                32768
>> bytes
>> Max address space         unlimited            unlimited
>> bytes
>> Max file locks            unlimited            unlimited
>> locks
>> Max pending signals       1024                 1024
>> signals
>> Max msgqueue size         819200               819200
>> bytes
>>
>> #cat /proc/16690/limits
>> Limit                     Soft Limit           Hard Limit
>> Units
>> Max cpu time              unlimited            unlimited
>> seconds
>> Max file size             unlimited            unlimited
>> bytes
>> Max data size             unlimited            unlimited
>> bytes
>> Max stack size            10485760             unlimited
>> bytes
>> Max core file size        unlimited            unlimited
>> bytes
>> Max resident set          unlimited            unlimited
>> bytes
>> Max processes             16383                16383
>> processes
>> Max open files            1024                 1024
>> files
>> Max locked memory         32768                32768
>> bytes
>> Max address space         unlimited            unlimited
>> bytes
>> Max file locks            unlimited            unlimited
>> locks
>> Max pending signals       1024                 1024
>> signals
>> Max msgqueue size         819200               819200
>> bytes
>>
>>
>> So, it doesn't appear that it's a problem with "ulimit"...
>>
>> Because the worker processes are non-privileged and the main daemon
>> process has / as its CWD, it's potentially a permissions problem.  To
>> get around that, I set kernel.core_pattern so that core files would go
>> into /tmp:
>>
>> #sysctl -a | egrep -i 'kernel.core'
>> kernel.core_pattern = /tmp/core.%p.%e.%s.%t
>> kernel.core_uses_pid = 1
>>
>>
>> After doing that, still no joy.  After some web searching, I was
>> even desperate enough to try setting "kernel.suid_dumpable" parameter
>> mentioned here:
>>
>>  	http://wiki.zimbra.com/index.php?title=Enabling_Core_Files
>>
>> even though the "lpd" process is not setuid, it just starts as root.
>> That too made no difference.
>>
>> On the off chance that the kernel.core_pattern wasn't being honored, I
>> even went so far as to briefly try changing ownership (to "lp") and
>> permissions (775) on /, to give the daemon permission to dump core in
>> /.
>> That also made no difference, so it's been undone.
>>
>> I've also tried pursuing using "systemtap" to install a segfault probe
>> that just watches for segfaults from processes named "lpd", and that
>> works
>> but unfortunately systemtap on RHEL4 cannot do user-level tracing,
>> which
>> is what I need.
>>
>> Anyone have any ideas on what I've missed?  To be able to debug what's
>> going on with the worker daemons, I really need to get my hands on
> some
>> of
>> the core files.  I'm comfortable with both gdb and strace/ltrace, but
>> if
>> at all possible I want to avoid attaching to the main daemon and just
>> using one of those tools to gather a huge volume of data just waiting
>> for
>> one of the forked children to segfault.  Capturing a core file would
> be
>> a much better way to start the debugging process.
>>
>> Thanks,
>>
>> Tim
>> --
>> Tim Mooney
>> Tim.Mooney at ndsu.edu
>> Enterprise Computing & Infrastructure                  701-231-1076
>> (Voice)
>> Room 242-J6, IACC Building                             701-231-8541
>> (Fax)
>> North Dakota State University, Fargo, ND 58105-5164
>>
>> --
>> redhat-sysadmin-list mailing list
>> redhat-sysadmin-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/redhat-sysadmin-list
>
> --
> redhat-sysadmin-list mailing list
> redhat-sysadmin-list at redhat.com
> https://www.redhat.com/mailman/listinfo/redhat-sysadmin-list
>

-- 
Tim Mooney                                             Tim.Mooney at ndsu.edu
Enterprise Computing & Infrastructure                  701-231-1076 (Voice)
Room 242-J6, IACC Building                             701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164




More information about the redhat-sysadmin-list mailing list