capturing a core file from a non-privileged daemon
Tim Mooney
Tim.Mooney at ndsu.edu
Thu Jul 15 21:21:07 UTC 2010
In regard to: RE: capturing a core file from a non-privileged daemon,...:
> I wonder if it is trying to right the core file to the
> root directory and failing?
That's why I tried both setting
kernel.core_pattern = /tmp/core.%p.%e.%s.%t
and actually chown/chmod on / so that it was writable by the lp user.
Neither change made any difference.
Tim
>> -----Original Message-----
>> From: Tim Mooney [mailto:Tim.Mooney at ndsu.edu]
>> Sent: Thursday, July 15, 2010 2:42 PM
>> To: redhat-sysadmin-list at redhat.com
>> Subject: capturing a core file from a non-privileged daemon
>>
>>
>> All-
>>
>> I have a system where a daemon (started as root) periodically forks
>> non-privileged workers, and those workers sometimes try to dump core.
>> I would like to capture those core files for debugging, but so far I
>> have been unable to find a way for the daemon to actually generate the
>> core file. I'm hoping someone can point out what I'm missing.
>>
>> The system is RHEL 4.8, x86_64, currently running 2.6.9-89.0.23.ELsmp.
>> Running "dmesg", I see dozens of these per day:
>>
>> #dmesg
>> lpd[12642]: segfault at 000000000000000c rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>> lpd[21006]: segfault at 000000000000000c rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>> lpd[16944]: segfault at 0000000036383675 rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>> lpd[19501]: segfault at 0000000036383675 rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>> lpd[11300]: segfault at 000000000000000c rip 0000000000a73cce rsp
>> 00000000ffffcea0 error 4
>>
>>
>> The daemon is a slightly older version of the LPRng lpd. Its model is
>> to
>> start as root but switch to a non-privileged user (lp) and then fork
>> workers
>> as needed for queue processing. The daemon is locally-compiled and is
>> not stripped.
>>
>> It's being started with the following line in /etc/init.d/lpd:
>>
>> daemon /usr/local/sbin/lpd
>>
>> Because the daemon shell function defaults to setting "ulimit -c 0",
>> I've
>> added the following two lines to the startup script, to override that
>> default behavior:
>>
>> DAEMON_COREFILE_LIMIT=unlimited
>> export DAEMON_COREFILE_LIMIT
>>
>> If I check the /proc/<pid>/limits file for both the master lpd process
>> or any of the worker processes, I can see that the core file limit is
>> "unlimited":
>>
>> $ ps -ef | grep -i lpd
>> lp 16689 1 1 Jul11 ? 01:17:39 lpd Waiting
>> lp 16690 16689 0 Jul11 ? 00:04:58 lpd LOG2
>>
>> #cat /proc/16689/limits
>> Limit Soft Limit Hard Limit
>> Units
>> Max cpu time unlimited unlimited
>> seconds
>> Max file size unlimited unlimited
>> bytes
>> Max data size unlimited unlimited
>> bytes
>> Max stack size 10485760 unlimited
>> bytes
>> Max core file size unlimited unlimited
>> bytes
>> Max resident set unlimited unlimited
>> bytes
>> Max processes 16383 16383
>> processes
>> Max open files 1024 1024
>> files
>> Max locked memory 32768 32768
>> bytes
>> Max address space unlimited unlimited
>> bytes
>> Max file locks unlimited unlimited
>> locks
>> Max pending signals 1024 1024
>> signals
>> Max msgqueue size 819200 819200
>> bytes
>>
>> #cat /proc/16690/limits
>> Limit Soft Limit Hard Limit
>> Units
>> Max cpu time unlimited unlimited
>> seconds
>> Max file size unlimited unlimited
>> bytes
>> Max data size unlimited unlimited
>> bytes
>> Max stack size 10485760 unlimited
>> bytes
>> Max core file size unlimited unlimited
>> bytes
>> Max resident set unlimited unlimited
>> bytes
>> Max processes 16383 16383
>> processes
>> Max open files 1024 1024
>> files
>> Max locked memory 32768 32768
>> bytes
>> Max address space unlimited unlimited
>> bytes
>> Max file locks unlimited unlimited
>> locks
>> Max pending signals 1024 1024
>> signals
>> Max msgqueue size 819200 819200
>> bytes
>>
>>
>> So, it doesn't appear that it's a problem with "ulimit"...
>>
>> Because the worker processes are non-privileged and the main daemon
>> process has / as its CWD, it's potentially a permissions problem. To
>> get around that, I set kernel.core_pattern so that core files would go
>> into /tmp:
>>
>> #sysctl -a | egrep -i 'kernel.core'
>> kernel.core_pattern = /tmp/core.%p.%e.%s.%t
>> kernel.core_uses_pid = 1
>>
>>
>> After doing that, still no joy. After some web searching, I was
>> even desperate enough to try setting "kernel.suid_dumpable" parameter
>> mentioned here:
>>
>> http://wiki.zimbra.com/index.php?title=Enabling_Core_Files
>>
>> even though the "lpd" process is not setuid, it just starts as root.
>> That too made no difference.
>>
>> On the off chance that the kernel.core_pattern wasn't being honored, I
>> even went so far as to briefly try changing ownership (to "lp") and
>> permissions (775) on /, to give the daemon permission to dump core in
>> /.
>> That also made no difference, so it's been undone.
>>
>> I've also tried pursuing using "systemtap" to install a segfault probe
>> that just watches for segfaults from processes named "lpd", and that
>> works
>> but unfortunately systemtap on RHEL4 cannot do user-level tracing,
>> which
>> is what I need.
>>
>> Anyone have any ideas on what I've missed? To be able to debug what's
>> going on with the worker daemons, I really need to get my hands on
> some
>> of
>> the core files. I'm comfortable with both gdb and strace/ltrace, but
>> if
>> at all possible I want to avoid attaching to the main daemon and just
>> using one of those tools to gather a huge volume of data just waiting
>> for
>> one of the forked children to segfault. Capturing a core file would
> be
>> a much better way to start the debugging process.
>>
>> Thanks,
>>
>> Tim
>> --
>> Tim Mooney
>> Tim.Mooney at ndsu.edu
>> Enterprise Computing & Infrastructure 701-231-1076
>> (Voice)
>> Room 242-J6, IACC Building 701-231-8541
>> (Fax)
>> North Dakota State University, Fargo, ND 58105-5164
>>
>> --
>> redhat-sysadmin-list mailing list
>> redhat-sysadmin-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/redhat-sysadmin-list
>
> --
> redhat-sysadmin-list mailing list
> redhat-sysadmin-list at redhat.com
> https://www.redhat.com/mailman/listinfo/redhat-sysadmin-list
>
--
Tim Mooney Tim.Mooney at ndsu.edu
Enterprise Computing & Infrastructure 701-231-1076 (Voice)
Room 242-J6, IACC Building 701-231-8541 (Fax)
North Dakota State University, Fargo, ND 58105-5164
More information about the redhat-sysadmin-list
mailing list