From LARSSKOV at dk.ibm.com Fri Nov 1 09:35:46 2013 From: LARSSKOV at dk.ibm.com (Lars Skovlund) Date: Fri, 1 Nov 2013 10:35:46 +0100 Subject: [Mod_nss-list] Crashing apache processes Message-ID: Hello list, As part of a customer case I'm working on, I've been trying to set up the combination of Apache, mod_nss and our own PKCS#11 provider. I've gotten the Apache server to start, but the tasks spawned by Apache are dying left and right: [Thu Oct 31 15:48:15 2013] [notice] child pid 6649 exit signal Segmentation fault (11) [Thu Oct 31 15:48:15 2013] [notice] child pid 6650 exit signal Segmentation fault (11) [Thu Oct 31 15:48:16 2013] [notice] child pid 6645 exit signal Segmentation fault (11), possible coredump in /tmp [Thu Oct 31 15:48:16 2013] [notice] child pid 6646 exit signal Segmentation fault (11), possible coredump in /tmp [Thu Oct 31 15:48:16 2013] [notice] child pid 6647 exit signal Segmentation fault (11), possible coredump in /tmp [Thu Oct 31 15:48:16 2013] [notice] child pid 6648 exit signal Segmentation fault (11), possible coredump in /tmp [Thu Oct 31 15:48:16 2013] [notice] child pid 6651 exit signal Segmentation fault (11) [Thu Oct 31 15:48:16 2013] [notice] child pid 6653 exit signal Segmentation fault (11) [Thu Oct 31 15:48:18 2013] [notice] child pid 6668 exit signal Segmentation fault (11), possible coredump in /tmp and so on. My investigation points toward NSS being shut down prematurely (by the main process?) while the incipient worker processes continue to call it (heavily edited for brevity): [root at cccclab4 ~]# gdb --args httpd -X -D FOREGROUND GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6) [...] (gdb) break NSS_Initialize Breakpoint 1 (NSS_Initialize) pending. (gdb) run Starting program: /usr/sbin/httpd -X -D FOREGROUND Breakpoint 1, NSS_Initialize (configdir=0x7ffff82bf0a0 "/root/y4nss/", certPrefix=0x0, keyPrefix=0x0, secmodName=0x7fffed87c9ae "secmod.db", flags=1) at nssinit.c:817 (gdb) watch g_default_trust_domain Hardware watchpoint 2: g_default_trust_domain (gdb) cont Continuing. Hardware watchpoint 2: g_default_trust_domain Old value = (NSSTrustDomain *) 0x0 New value = (NSSTrustDomain *) 0x7ffff8426be0 STAN_LoadDefaultNSS3TrustDomain () at pki3hack.c:153 153 return PR_SUCCESS; (gdb) cont Continuing. Please enter password for "TEST" token: [Thread 0x7fffe3c37700 (LWP 10913) exited] (gdb) cont Hardware watchpoint 2: g_default_trust_domain Old value = (NSSTrustDomain *) 0x7ffff8426be0 New value = (NSSTrustDomain *) 0x0 0x00007ffff34a1828 in STAN_Shutdown () at pki3hack.c:212 212 g_default_trust_domain = NULL; (gdb) cont Continuing. [New Thread 0x7fffe3c37700 (LWP 10916)] Program received signal SIGSEGV, Segmentation fault. nssTrustDomain_GetCertsFromCache (td=0x0, certListOpt=0x7ffff836a7e0) at tdcache.c:1127 1127 PZ_Lock(td->cache->lock); (gdb) print td $1 = (NSSTrustDomain *) 0x0 (gdb) bt #0 nssTrustDomain_GetCertsFromCache (td=0x0, certListOpt=0x7ffff836a7e0) at tdcache.c:1127 #1 0x00007ffff349b727 in NSSTrustDomain_TraverseCertificates (td=0x0, callback=0x7ffff34623e0 , arg=0x7fffffffdf20) at trustdomain.c:1015 #2 0x00007ffff34622fc in PK11_ListCerts (type=PK11CertListUser, pwarg=0x0) at pk11cert.c:2509 #3 0x00007fffed872772 in nss_init_Child (p=0x7ffff83d4c08, base_server=0x7ffff8212880) at nss_engine_init.c:1370 #4 0x00007ffff7fd6b0c in ap_run_child_init (pchild=0x7ffff83d4c08, s=0x7ffff8212880) at /usr/src/debug/httpd-2.2.15/server/config.c:155 #5 0x00007ffff7fea725 in child_main (child_num_arg=) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:518 #6 0x00007ffff7feac46 in make_child (s=0x7ffff8212880, slot=0) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:707 #7 0x00007ffff7feb293 in ap_mpm_run (_pconf=, plog=, s=) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:983 #8 0x00007ffff7fc2900 in main (argc=4, argv=0x7fffffffe408) at /usr/src/debug/httpd-2.2.15/server/main.c:760 The NULL pointer that is dereferenced in the final steps comes from the global variable I watched. Is this a known bug, or are there configuration problems that are known to cause this? The versions of Apache and nss I am using are these (they are the versions I got from our local RHN node): [root at cccclab4 ~]# rpm -qa nss httpd httpd-2.2.15-29.el6_4.x86_64 nss-3.14.0.0-12.el6.x86_64 [root at cccclab4 ~]# Any help you can give is greatly appreciated. Best regards, Lars Medmindre andet er angivet ovenfor: / Unless Otherwise Stated Above: IBM Danmark ApS Nym?llevej 91 2800 Kongens Lyngby, Danmark CVR nr.: 65305216 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcritten at redhat.com Fri Nov 1 12:59:36 2013 From: rcritten at redhat.com (Rob Crittenden) Date: Fri, 01 Nov 2013 08:59:36 -0400 Subject: [Mod_nss-list] Crashing apache processes In-Reply-To: References: Message-ID: <5273A5B8.3000106@redhat.com> Lars Skovlund wrote: > Hello list, > > As part of a customer case I'm working on, I've been trying to set up > the combination of Apache, mod_nss and our own PKCS#11 provider. > I've gotten the Apache server to start, but the tasks spawned by Apache > are dying left and right: > > [Thu Oct 31 15:48:15 2013] [notice] child pid 6649 exit signal > Segmentation fault (11) > [Thu Oct 31 15:48:15 2013] [notice] child pid 6650 exit signal > Segmentation fault (11) > [Thu Oct 31 15:48:16 2013] [notice] child pid 6645 exit signal > Segmentation fault (11), possible coredump in /tmp > [Thu Oct 31 15:48:16 2013] [notice] child pid 6646 exit signal > Segmentation fault (11), possible coredump in /tmp > [Thu Oct 31 15:48:16 2013] [notice] child pid 6647 exit signal > Segmentation fault (11), possible coredump in /tmp > [Thu Oct 31 15:48:16 2013] [notice] child pid 6648 exit signal > Segmentation fault (11), possible coredump in /tmp > [Thu Oct 31 15:48:16 2013] [notice] child pid 6651 exit signal > Segmentation fault (11) > [Thu Oct 31 15:48:16 2013] [notice] child pid 6653 exit signal > Segmentation fault (11) > [Thu Oct 31 15:48:18 2013] [notice] child pid 6668 exit signal > Segmentation fault (11), possible coredump in /tmp > > and so on. My investigation points toward NSS being shut down > prematurely (by the main process?) while the incipient worker processes > continue to call it (heavily edited for brevity): > > [root at cccclab4 ~]# gdb --args httpd -X -D FOREGROUND > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6) > [...] > (gdb) break NSS_Initialize > Breakpoint 1 (NSS_Initialize) pending. > (gdb) run > Starting program: /usr/sbin/httpd -X -D FOREGROUND > Breakpoint 1, NSS_Initialize (configdir=0x7ffff82bf0a0 "/root/y4nss/", > certPrefix=0x0, keyPrefix=0x0, secmodName=0x7fffed87c9ae "secmod.db", > flags=1) at nssinit.c:817 > (gdb) watch g_default_trust_domain > Hardware watchpoint 2: g_default_trust_domain > (gdb) cont > Continuing. > Hardware watchpoint 2: g_default_trust_domain > > Old value = (NSSTrustDomain *) 0x0 > New value = (NSSTrustDomain *) 0x7ffff8426be0 > STAN_LoadDefaultNSS3TrustDomain () at pki3hack.c:153 > 153 return PR_SUCCESS; > (gdb) cont > Continuing. > Please enter password for "TEST" token: > [Thread 0x7fffe3c37700 (LWP 10913) exited] > (gdb) cont > Hardware watchpoint 2: g_default_trust_domain > > Old value = (NSSTrustDomain *) 0x7ffff8426be0 > New value = (NSSTrustDomain *) 0x0 > 0x00007ffff34a1828 in STAN_Shutdown () at pki3hack.c:212 > 212 g_default_trust_domain = NULL; > (gdb) cont > Continuing. > [New Thread 0x7fffe3c37700 (LWP 10916)] > > Program received signal SIGSEGV, Segmentation fault. > nssTrustDomain_GetCertsFromCache (td=0x0, certListOpt=0x7ffff836a7e0) > at tdcache.c:1127 > 1127 PZ_Lock(td->cache->lock); > (gdb) print td > $1 = (NSSTrustDomain *) 0x0 > (gdb) bt > #0 nssTrustDomain_GetCertsFromCache (td=0x0, certListOpt=0x7ffff836a7e0) > at tdcache.c:1127 > #1 0x00007ffff349b727 in NSSTrustDomain_TraverseCertificates (td=0x0, > callback=0x7ffff34623e0 , arg=0x7fffffffdf20) > at trustdomain.c:1015 > #2 0x00007ffff34622fc in PK11_ListCerts (type=PK11CertListUser, pwarg=0x0) > at pk11cert.c:2509 > #3 0x00007fffed872772 in nss_init_Child (p=0x7ffff83d4c08, > base_server=0x7ffff8212880) at nss_engine_init.c:1370 > #4 0x00007ffff7fd6b0c in ap_run_child_init (pchild=0x7ffff83d4c08, > s=0x7ffff8212880) at /usr/src/debug/httpd-2.2.15/server/config.c:155 > #5 0x00007ffff7fea725 in child_main (child_num_arg=) > at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:518 > #6 0x00007ffff7feac46 in make_child (s=0x7ffff8212880, slot=0) > at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:707 > #7 0x00007ffff7feb293 in ap_mpm_run (_pconf=, > plog=, s=) > at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:983 > #8 0x00007ffff7fc2900 in main (argc=4, argv=0x7fffffffe408) > at /usr/src/debug/httpd-2.2.15/server/main.c:760 > > The NULL pointer that is dereferenced in the final steps comes from the > global variable I watched. Is this a known bug, or are there > configuration problems that are known to cause this? > The versions of Apache and nss I am using are these (they are the > versions I got from our local RHN node): > > [root at cccclab4 ~]# rpm -qa nss httpd > httpd-2.2.15-29.el6_4.x86_64 > nss-3.14.0.0-12.el6.x86_64 > [root at cccclab4 ~]# > > Any help you can give is greatly appreciated. Ok, so Apache makes things difficult for us. It loads and reloads the modules a couple of times during startup. During the initial start stdout/stdin are still open and things are launched as root. This, from an Apache perspective, is just a sanity startup to get the list of configuration options available in the module. We take this opportunity to prompt for any token passwords that are needed. Then Apache unloads the module. We have to shut down NSS when this happens. Then it restarts things, perhaps in multiple forked children. In each one we initialize NSS and apply the configuration. Is this the opencryptoki module? rob From LARSSKOV at dk.ibm.com Fri Nov 1 15:35:24 2013 From: LARSSKOV at dk.ibm.com (Lars Skovlund) Date: Fri, 1 Nov 2013 16:35:24 +0100 Subject: [Mod_nss-list] Crashing apache processes In-Reply-To: <5273A5B8.3000106@redhat.com> References: <5273A5B8.3000106@redhat.com> Message-ID: Hi Rob I'm a little confused as to why you're asking me about Opencryptoki. Are there known issues with it? No, we are talking to a hardware PKCS#11 implementation based on the IBM 4765 (which Opencryptoki can also do, but based on the standard (CCA) firmware and not the specific PKCS#11 code that I use). With your description below, I was able to find the problem in this snippet of mod_nss code: if (chdir(mc->pCertificateDatabase) != 0) { ap_log_error(APLOG_MARK, APLOG_ERR, 0, base_server, "Unable to change directory to %s", mc->pCertificateDatabase); if (mc->nInitCount == 1) nss_die(); else return; } rv = NSS_Initialize(mc->pCertificateDatabase, mc->pDBPrefix, mc->pDBPrefix, "secmod.db", NSS_INIT_READONLY); On my test machine, the apache user had no access to the NSS database directory (it was in /root/y4nss), and I had missed this in the logs. According to the above, if that chdir fails, then nss_init_SSLLibrary returns with no indication of error, and without calling NSS_Initialize. The very next line of nss_init_Child then goes on to make an NSS call, and fails with a segfault. So this could be argued to be an NSS bug, but also, mod_nss should arguably handle the situation more gracefully on its end. Now I at least know what caused it. Thanks, Rob! Best regards Lars Rob Crittenden wrote on 11/01/2013 01:59:36 PM: > Rob Crittenden > 11/01/2013 01:59 PM > > To > > Lars Skovlund/Denmark/IBM at IBMDK, mod_nss-list at redhat.com, > > cc > > Subject > > Re: [Mod_nss-list] Crashing apache processes > > Lars Skovlund wrote: > > Hello list, > > > > As part of a customer case I'm working on, I've been trying to set up > > the combination of Apache, mod_nss and our own PKCS#11 provider. > > I've gotten the Apache server to start, but the tasks spawned by Apache > > are dying left and right: > > > > [Thu Oct 31 15:48:15 2013] [notice] child pid 6649 exit signal > > Segmentation fault (11) > > [Thu Oct 31 15:48:15 2013] [notice] child pid 6650 exit signal > > Segmentation fault (11) > > [Thu Oct 31 15:48:16 2013] [notice] child pid 6645 exit signal > > Segmentation fault (11), possible coredump in /tmp > > [Thu Oct 31 15:48:16 2013] [notice] child pid 6646 exit signal > > Segmentation fault (11), possible coredump in /tmp > > [Thu Oct 31 15:48:16 2013] [notice] child pid 6647 exit signal > > Segmentation fault (11), possible coredump in /tmp > > [Thu Oct 31 15:48:16 2013] [notice] child pid 6648 exit signal > > Segmentation fault (11), possible coredump in /tmp > > [Thu Oct 31 15:48:16 2013] [notice] child pid 6651 exit signal > > Segmentation fault (11) > > [Thu Oct 31 15:48:16 2013] [notice] child pid 6653 exit signal > > Segmentation fault (11) > > [Thu Oct 31 15:48:18 2013] [notice] child pid 6668 exit signal > > Segmentation fault (11), possible coredump in /tmp > > > > and so on. My investigation points toward NSS being shut down > > prematurely (by the main process?) while the incipient worker processes > > continue to call it (heavily edited for brevity): > > > > [root at cccclab4 ~]# gdb --args httpd -X -D FOREGROUND > > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6) > > [...] > > (gdb) break NSS_Initialize > > Breakpoint 1 (NSS_Initialize) pending. > > (gdb) run > > Starting program: /usr/sbin/httpd -X -D FOREGROUND > > Breakpoint 1, NSS_Initialize (configdir=0x7ffff82bf0a0 "/root/y4nss/", > > certPrefix=0x0, keyPrefix=0x0, secmodName=0x7fffed87c9ae "secmod.db", > > flags=1) at nssinit.c:817 > > (gdb) watch g_default_trust_domain > > Hardware watchpoint 2: g_default_trust_domain > > (gdb) cont > > Continuing. > > Hardware watchpoint 2: g_default_trust_domain > > > > Old value = (NSSTrustDomain *) 0x0 > > New value = (NSSTrustDomain *) 0x7ffff8426be0 > > STAN_LoadDefaultNSS3TrustDomain () at pki3hack.c:153 > > 153 return PR_SUCCESS; > > (gdb) cont > > Continuing. > > Please enter password for "TEST" token: > > [Thread 0x7fffe3c37700 (LWP 10913) exited] > > (gdb) cont > > Hardware watchpoint 2: g_default_trust_domain > > > > Old value = (NSSTrustDomain *) 0x7ffff8426be0 > > New value = (NSSTrustDomain *) 0x0 > > 0x00007ffff34a1828 in STAN_Shutdown () at pki3hack.c:212 > > 212 g_default_trust_domain = NULL; > > (gdb) cont > > Continuing. > > [New Thread 0x7fffe3c37700 (LWP 10916)] > > > > Program received signal SIGSEGV, Segmentation fault. > > nssTrustDomain_GetCertsFromCache (td=0x0, certListOpt=0x7ffff836a7e0) > > at tdcache.c:1127 > > 1127 PZ_Lock(td->cache->lock); > > (gdb) print td > > $1 = (NSSTrustDomain *) 0x0 > > (gdb) bt > > #0 nssTrustDomain_GetCertsFromCache (td=0x0, certListOpt=0x7ffff836a7e0) > > at tdcache.c:1127 > > #1 0x00007ffff349b727 in NSSTrustDomain_TraverseCertificates (td=0x0, > > callback=0x7ffff34623e0 , arg=0x7fffffffdf20) > > at trustdomain.c:1015 > > #2 0x00007ffff34622fc in PK11_ListCerts (type=PK11CertListUser, pwarg=0x0) > > at pk11cert.c:2509 > > #3 0x00007fffed872772 in nss_init_Child (p=0x7ffff83d4c08, > > base_server=0x7ffff8212880) at nss_engine_init.c:1370 > > #4 0x00007ffff7fd6b0c in ap_run_child_init (pchild=0x7ffff83d4c08, > > s=0x7ffff8212880) at /usr/src/debug/httpd-2.2.15/server/config.c:155 > > #5 0x00007ffff7fea725 in child_main (child_num_arg=) > > at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:518 > > #6 0x00007ffff7feac46 in make_child (s=0x7ffff8212880, slot=0) > > at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:707 > > #7 0x00007ffff7feb293 in ap_mpm_run (_pconf=, > > plog=, s=) > > at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:983 > > #8 0x00007ffff7fc2900 in main (argc=4, argv=0x7fffffffe408) > > at /usr/src/debug/httpd-2.2.15/server/main.c:760 > > > > The NULL pointer that is dereferenced in the final steps comes from the > > global variable I watched. Is this a known bug, or are there > > configuration problems that are known to cause this? > > The versions of Apache and nss I am using are these (they are the > > versions I got from our local RHN node): > > > > [root at cccclab4 ~]# rpm -qa nss httpd > > httpd-2.2.15-29.el6_4.x86_64 > > nss-3.14.0.0-12.el6.x86_64 > > [root at cccclab4 ~]# > > > > Any help you can give is greatly appreciated. > > Ok, so Apache makes things difficult for us. It loads and reloads the > modules a couple of times during startup. > > During the initial start stdout/stdin are still open and things are > launched as root. This, from an Apache perspective, is just a sanity > startup to get the list of configuration options available in the > module. We take this opportunity to prompt for any token passwords that > are needed. > > Then Apache unloads the module. We have to shut down NSS when this happens. > > Then it restarts things, perhaps in multiple forked children. In each > one we initialize NSS and apply the configuration. > > Is this the opencryptoki module? > > rob > Medmindre andet er angivet ovenfor: / Unless Otherwise Stated Above: IBM Danmark ApS Nym?llevej 91 2800 Kongens Lyngby, Danmark CVR nr.: 65305216 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcritten at redhat.com Fri Nov 1 15:38:57 2013 From: rcritten at redhat.com (Rob Crittenden) Date: Fri, 01 Nov 2013 11:38:57 -0400 Subject: [Mod_nss-list] Crashing apache processes In-Reply-To: References: <5273A5B8.3000106@redhat.com> Message-ID: <5273CB11.5000004@redhat.com> Lars Skovlund wrote: > Hi Rob > > I'm a little confused as to why you're asking me about Opencryptoki. Are > there known issues with it? > No, we are talking to a hardware PKCS#11 implementation based on the IBM > 4765 (which Opencryptoki > can also do, but based on the standard (CCA) firmware and not the > specific PKCS#11 code that I use). Was just trying to figure out what I was dealing with. Given your domain name and the mention of PKCS#11 it was a natural assumption :-) > > With your description below, I was able to find the problem in this > snippet of mod_nss code: > > if (chdir(mc->pCertificateDatabase) != 0) { > ap_log_error(APLOG_MARK, APLOG_ERR, 0, base_server, > "Unable to change directory to %s", mc->pCertificateDatabase); > if (mc->nInitCount == 1) > nss_die(); > else > return; > } > rv = NSS_Initialize(mc->pCertificateDatabase, mc->pDBPrefix, > mc->pDBPrefix, "secmod.db", NSS_INIT_READONLY); > > On my test machine, the apache user had no access to the NSS database > directory (it was in /root/y4nss), > and I had missed this in the logs. According to the above, if that chdir > fails, then nss_init_SSLLibrary returns > with no indication of error, and without calling NSS_Initialize. The > very next line of nss_init_Child then > goes on to make an NSS call, and fails with a segfault. So this could be > argued to be an NSS bug, but also, > mod_nss should arguably handle the situation more gracefully on its end. Ah, ok. I'll open an upstream bug on this. > Now I at least know what caused it. Thanks, Rob! > > Best regards > > Lars cheers rob