From jorgito1412 at gmail.com Fri Feb 21 15:17:25 2020 From: jorgito1412 at gmail.com (George) Date: Fri, 21 Feb 2020 15:17:25 +0000 Subject: [Mod_nss-list] Inconsistent SSL issues with HSM Message-ID: We have been troubleshooting SSL issues using Apache with mod_nss and Safenet HSMs for quite a while, so hopefully you can provide some insight. Red Hat Enterprise Linux Server release 7.6 (Maipo) httpd.x86_64 2.4.6-90.el7 @rhel-7-server-rpms mod_nss.x86_64 1.0.14-12.el7 @rhel-7-server-rpms nss.x86_64 3.36.0-7.1.el7_6 @rhel-7-server-rpms Apache is configured with mod_nss and Safenet libcryptoki, using TLSv1.2. We see that in a seemingly random fashion, the Apache worker suddenly can't talk to the HSM anymore, and can't recover from that. All subsequent requests handled by this worker fail with the same error message. Only way to recover is to kill the worker (or restart whole Apache): [Tue Sep 24 20:21:19.375686 2019] [:error] [pid 2646] SSL Library Error: -8152 The key does not support the requested operation Packet captures show that the incoming TLS Client Hello that triggers the error is identical to a successful one. We have noticed, nevertheless, that there *might* be some correspondence with TLS session reuse. There have been several instances in which a worker fails with this error soon after handling a resumed TLS session (so the worker receives a Client Hello with a session-id, successfully handles the request without renegotiation but fails soon after on a subsequent request). Anyway, I couldn't find any way to effectively disable TLS session reuse in mod_nss (can that be done??) We have also sniffed the PKCS11 conversation between NSS and the HSM and can we see where the problem occurs. It looks that the Apache worker is trying to perform operations on an invalid object handle (pay attention to hObject=0x00001A60). PID 2646 is the Apache worker that failed in the example before: pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.313) > C_GetAttributeValue hSession=0x00020001 hObject=0x00001A60 pTemplate=0x0x7ffe7173b7c0 count=1 pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.348) << C_GetAttributeValue rv=0x00000082{object handle invalid} pTemplate=0x0x7ffe7173b7c0 pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.357) > C_SignInit hSession=0x0002005B pMechanism=0x0x7ffe7173b760{type=0x1{RSA_PKCS} pParam=0x(nil) paramLen=0} hObject=0x00001A60 pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.374) << C_SignInit rv=0x00000060{key handle invalid} That object handler 0x00001A60 seems to have been explicitly destroyed by the same process more than 2 hours before in this example: pid(2646) tid(140580153710720) time(24/09/2019,18:00:09.521) > C_DestroyObject hSession=0x00020001 hObject=0x00001A60 pid(2646) tid(140580153710720) time(24/09/2019,18:00:09.523) < C_DestroyObject rv=0x00000000{success} Any further information or ideas are welcome. Best regards! George -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcritten at redhat.com Fri Feb 21 18:41:19 2020 From: rcritten at redhat.com (Rob Crittenden) Date: Fri, 21 Feb 2020 13:41:19 -0500 Subject: [Mod_nss-list] Inconsistent SSL issues with HSM In-Reply-To: References: Message-ID: <5e94a374-b90d-d5d4-8258-0390bf9e14f6@redhat.com> George wrote: > We have been troubleshooting SSL issues using Apache with mod_nss and > Safenet HSMs for quite a while, so hopefully you can provide some insight. > > Red Hat Enterprise Linux Server release 7.6 (Maipo) > httpd.x86_64 ? ? ? ? ? ? ? ? ? ? ?2.4.6-90.el7 ? ? ? ? ? ? ? > @rhel-7-server-rpms > mod_nss.x86_64 ? ? ? ? ? ? ? ? ? ?1.0.14-12.el7 ? ? ? ? ? ? > ?@rhel-7-server-rpms > nss.x86_64 ? ? ? ? ? ? ? ? ? ? ? ?3.36.0-7.1.el7_6 ? ? ? ? ? > @rhel-7-server-rpms > > Apache is configured with mod_nss and Safenet libcryptoki, using > TLSv1.2. We see that in a seemingly random fashion, the Apache worker > suddenly can't talk to the HSM anymore, and can't recover from that. All > subsequent requests handled by this worker fail with the same error > message. Only way to recover is to kill the worker (or restart whole > Apache): > > [Tue Sep 24 20:21:19.375686 2019] [:error] [pid 2646] SSL Library Error: > -8152 The key does not support the requested operation > > Packet captures show that the incoming TLS Client Hello that triggers > the error is identical to a successful one. We have noticed, > nevertheless, that there *might* be some correspondence with TLS session > reuse. There have been several instances in which a worker fails with > this error soon after handling a resumed TLS session (so the worker > receives a Client Hello with a session-id, successfully handles the > request without renegotiation but fails soon after on a subsequent > request). Anyway, I couldn't find any way to effectively disable TLS > session reuse in mod_nss (can that be done??) > > We have also sniffed the PKCS11 conversation between NSS and the HSM and > can we see where the problem occurs. It looks that the Apache worker is > trying to perform operations on an invalid object handle (pay attention > to hObject=0x00001A60). PID 2646 is the Apache worker that failed in the > example before: > > pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.313) ? ? > > C_GetAttributeValue hSession=0x00020001 hObject=0x00001A60 > pTemplate=0x0x7ffe7173b7c0 count=1 > pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.348) ? ? << > C_GetAttributeValue rv=0x00000082{object handle invalid} > pTemplate=0x0x7ffe7173b7c0 > > pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.357) ? ? > > C_SignInit hSession=0x0002005B > pMechanism=0x0x7ffe7173b760{type=0x1{RSA_PKCS} pParam=0x(nil) > paramLen=0} hObject=0x00001A60 > pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.374) ? ? << > C_SignInit rv=0x00000060{key handle invalid} > > > That object handler 0x00001A60 seems to have been explicitly destroyed > by the same process more than 2 hours before in this example: > > pid(2646) tid(140580153710720) time(24/09/2019,18:00:09.521) ? ? > > C_DestroyObject hSession=0x00020001 hObject=0x00001A60 > pid(2646) tid(140580153710720) time(24/09/2019,18:00:09.523) ? ? < > C_DestroyObject rv=0x00000000{success} > > > Any further information or ideas are welcome. I can't really speak to the PKCS#11 errors since NSS hides all that. I'm not sure what in NSS would trigger the C_DestroyObject call in your PKCS#11 driver. There is not currently a way to disable session caching in mod_nss. It would only be a couple of lines of code if you wanted to experiment with it. This patch would do it I think. A cache would still be setup but by setting SSL_NO_CACHE it would be ignored. I don't know what sort of performance hit this will add. diff --git a/nss_engine_init.c b/nss_engine_init.c index 61e2f499..85756e63 100644 --- a/nss_engine_init.c +++ b/nss_engine_init.c @@ -729,14 +729,14 @@ static void nss_init_ctx_socket(server_rec *s, nss_log_nss_error(APLOG_MARK, APLOG_ERR, s); nss_die(); } - if (!mctx->as_server) { +// if (!mctx->as_server) { if ((SSL_OptionSet(mctx->model, SSL_NO_CACHE, PR_TRUE)) != SECSuccess) { ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, s, "Unable to disable SSL client caching"); nss_log_nss_error(APLOG_MARK, APLOG_ERR, s); nss_die(); } - } +// } #ifdef SSL_ENABLE_RENEGOTIATION if (SSL_OptionSet(mctx->model, SSL_ENABLE_RENEGOTIATION, mctx->enablerenegotiation ? rob From jorgito1412 at gmail.com Fri Feb 21 20:44:57 2020 From: jorgito1412 at gmail.com (George) Date: Fri, 21 Feb 2020 20:44:57 +0000 Subject: [Mod_nss-list] Inconsistent SSL issues with HSM In-Reply-To: <5e94a374-b90d-d5d4-8258-0390bf9e14f6@redhat.com> References: <5e94a374-b90d-d5d4-8258-0390bf9e14f6@redhat.com> Message-ID: Hi Rob, Thanks for the information and the patch. I will post to the NSS lists and also give the patch a try if I have time. Best regards! On Fri, Feb 21, 2020 at 6:41 PM Rob Crittenden wrote: > George wrote: > > We have been troubleshooting SSL issues using Apache with mod_nss and > > Safenet HSMs for quite a while, so hopefully you can provide some > insight. > > > > Red Hat Enterprise Linux Server release 7.6 (Maipo) > > httpd.x86_64 2.4.6-90.el7 > > @rhel-7-server-rpms > > mod_nss.x86_64 1.0.14-12.el7 > > @rhel-7-server-rpms > > nss.x86_64 3.36.0-7.1.el7_6 > > @rhel-7-server-rpms > > > > Apache is configured with mod_nss and Safenet libcryptoki, using > > TLSv1.2. We see that in a seemingly random fashion, the Apache worker > > suddenly can't talk to the HSM anymore, and can't recover from that. All > > subsequent requests handled by this worker fail with the same error > > message. Only way to recover is to kill the worker (or restart whole > > Apache): > > > > [Tue Sep 24 20:21:19.375686 2019] [:error] [pid 2646] SSL Library Error: > > -8152 The key does not support the requested operation > > > > Packet captures show that the incoming TLS Client Hello that triggers > > the error is identical to a successful one. We have noticed, > > nevertheless, that there *might* be some correspondence with TLS session > > reuse. There have been several instances in which a worker fails with > > this error soon after handling a resumed TLS session (so the worker > > receives a Client Hello with a session-id, successfully handles the > > request without renegotiation but fails soon after on a subsequent > > request). Anyway, I couldn't find any way to effectively disable TLS > > session reuse in mod_nss (can that be done??) > > > > We have also sniffed the PKCS11 conversation between NSS and the HSM and > > can we see where the problem occurs. It looks that the Apache worker is > > trying to perform operations on an invalid object handle (pay attention > > to hObject=0x00001A60). PID 2646 is the Apache worker that failed in the > > example before: > > > > pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.313) > > > C_GetAttributeValue hSession=0x00020001 hObject=0x00001A60 > > pTemplate=0x0x7ffe7173b7c0 count=1 > > pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.348) << > > C_GetAttributeValue rv=0x00000082{object handle invalid} > > pTemplate=0x0x7ffe7173b7c0 > > > > pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.357) > > > C_SignInit hSession=0x0002005B > > pMechanism=0x0x7ffe7173b760{type=0x1{RSA_PKCS} pParam=0x(nil) > > paramLen=0} hObject=0x00001A60 > > pid(2646) tid(140580153710720) time(24/09/2019,20:21:19.374) << > > C_SignInit rv=0x00000060{key handle invalid} > > > > > > That object handler 0x00001A60 seems to have been explicitly destroyed > > by the same process more than 2 hours before in this example: > > > > pid(2646) tid(140580153710720) time(24/09/2019,18:00:09.521) > > > C_DestroyObject hSession=0x00020001 hObject=0x00001A60 > > pid(2646) tid(140580153710720) time(24/09/2019,18:00:09.523) < > > C_DestroyObject rv=0x00000000{success} > > > > > > Any further information or ideas are welcome. > > I can't really speak to the PKCS#11 errors since NSS hides all that. I'm > not sure what in NSS would trigger the C_DestroyObject call in your > PKCS#11 driver. > > There is not currently a way to disable session caching in mod_nss. It > would only be a couple of lines of code if you wanted to experiment with > it. This patch would do it I think. A cache would still be setup but by > setting SSL_NO_CACHE it would be ignored. I don't know what sort of > performance hit this will add. > > diff --git a/nss_engine_init.c b/nss_engine_init.c > index 61e2f499..85756e63 100644 > --- a/nss_engine_init.c > +++ b/nss_engine_init.c > @@ -729,14 +729,14 @@ static void nss_init_ctx_socket(server_rec *s, > nss_log_nss_error(APLOG_MARK, APLOG_ERR, s); > nss_die(); > } > - if (!mctx->as_server) { > +// if (!mctx->as_server) { > if ((SSL_OptionSet(mctx->model, SSL_NO_CACHE, PR_TRUE)) != > SECSuccess) > { > ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, s, > "Unable to disable SSL client caching"); > nss_log_nss_error(APLOG_MARK, APLOG_ERR, s); > nss_die(); > } > - } > +// } > #ifdef SSL_ENABLE_RENEGOTIATION > if (SSL_OptionSet(mctx->model, SSL_ENABLE_RENEGOTIATION, > mctx->enablerenegotiation ? > > > rob > > -------------- next part -------------- An HTML attachment was scrubbed... URL: