[EnMasse] Redeploying new certificates to Openshift results in Enmasse not being reachable anymore

Bob Claerhout bob.claerhout at aloxy.io
Wed Jul 10 11:57:07 UTC 2019


Hi Jens,

We have some progress now. The routes for Enmasse are available again after removing the secrets and a reboot.
However, Ditto is not yet able to connect to Enmasse.
For Hono I don't know yet for sure because the adapter seems to be unable to connect to the device registry as well.

Best regards,
Bob

On 7/10/19 11:54 AM, Jens Reimann wrote:
Hi Bob,

no, that should have the same effect. As the applications got restarted and had to chance to reload their files.

One other thing that might be, you could manually delete the secrets containing the inter service certificates. You still need to restart the pods after that. If the service CA was recreated, but the certificates did not get renewed, then this would be an issue as well. Showing the same behavior as if the pods did not pick up the new certificates.

You should be able to find the corresponding secrets by looking for an annotation of `service.alpha.openshift.io/originating-service-name`<http://service.alpha.openshift.io/originating-service-name`>.

Cheers

Jens

On Wed, Jul 10, 2019 at 10:08 AM Bob Claerhout <bob.claerhout at aloxy.io<mailto:bob.claerhout at aloxy.io>> wrote:
Hi Jens,

Thanks for your response.
In an attempt to fix the issues, I rebooted the server yesterday which didn't work. So unless this is different from killing the pods, this isn't working.

However, I scaled down the adapters and back up and the AMQP adapter (from Hono) seems to be able to connect to Enmasse although not passing the readiness check:

07:16:08.983 [vert.x-eventloop-thread-0] INFO  o.e.h.a.a.i.VertxBasedAmqpProtocolAdapter - secure AMQP server listening on 0.0.0.0:5671<http://0.0.0.0:5671>
07:16:08.983 [vert.x-eventloop-thread-0] INFO  o.e.h.a.a.i.VertxBasedAmqpProtocolAdapter - insecure AMQP server listening on [0.0.0.0:5672<http://0.0.0.0:5672>]
07:16:08.993 [vert.x-eventloop-thread-1] INFO  o.e.h.s.VertxBasedHealthCheckServer - registering additional resource: Prometheus Registry Scraper [endpoint: /prometheus]
07:16:09.001 [vert.x-eventloop-thread-1] INFO  o.e.h.s.VertxBasedHealthCheckServer - readiness probe available at http://0.0.0.0:8088/readiness
07:16:09.001 [vert.x-eventloop-thread-1] INFO  o.e.h.s.VertxBasedHealthCheckServer - liveness probe available at http://0.0.0.0:8088/liveness
07:16:09.004 [main] INFO  o.e.h.adapter.amqp.impl.Application - Started Application in 2.914 seconds (JVM running for 3.44)
07:16:09.355 [vert.x-eventloop-thread-0] INFO  o.e.h.a.a.i.VertxBasedAmqpProtocolAdapter - connected to AMQP Messaging Network

Best regards,
Bob

On 7/10/19 9:08 AM, Jens Reimann wrote:
Hi Bob,

I think (and this is just a guess) that redeploying the certificates, also redeploys the internal service CA. So that would mean that both your TLS server and clients need to pick up those new keys/certs/CAbundle from the file system. OKD injects those into the file system during runtime. However, most Java applications (and I think Hono is not different here) read in the cert material on startup, and then stick to that.

So it might help to simply kill all the pods (clients + servers) and let them come up again.

If that fixes the your situation, then I think that this is indeed the issue. And we should talk about an alternate way to handle it.

Otherwise we would need to keep looking for the cause.

Cheers

Jens


On Tue, Jul 9, 2019 at 5:31 PM Bob Claerhout <bob.claerhout at aloxy.io<mailto:bob.claerhout at aloxy.io>> wrote:
Hi all,

Today I have redeployed new certificates on our OKD cluster.
I've done this by retrieving new certifcates with certbot. The certificates were renewed perfectly after which I redeployed the certificates using following ansible-playbook command:
     ansible-playbook -i inventory.ini playbooks/redeploy-certificates.yml

This succeeds perfectly apart from redeploying the certificates to the service_catalog because (apparently) I do not have a kube-service-catalog (I'm just giving all information I have to present a complete story). When I disable updating the certificates of this service-catalog, the script succeeds. After that, my frontend application and the openshift console are using the new certificate.

However, Eclipse hono is no longer able to connect to Enmasse.

The hono adapters are getting following exceptions:
14:49:14.276 [vert.x-eventloop-thread-0] DEBUG o.e.h.client.impl.HonoConnectionImpl - connection attempt failed
javax.net.ssl.SSLHandshakeException: Failed to create SSL connection
    at io.vertx.core.net.impl.ChannelProvider$1.userEventTriggered(ChannelProvider.java:111)
    at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
    at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
    at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
    at io.netty.handler.ssl.SslHandler.handleUnwrapThrowable(SslHandler.java:1224)
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1205)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1243)
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:321)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:264)
    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:259)
    at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:642)
    at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:461)
    at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:361)
    at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
    at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:448)
    at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1065)
    at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:1052)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:999)
    at io.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1457)
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1365)
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1199)
    ... 19 common frames omitted
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:385)
    at java.base/sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:290)
    at java.base/sun.security.validator.Validator.validate(Validator.java:264)
    at java.base/sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:321)
    at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:279)
    at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:141)
    at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:620)
    ... 30 common frames omitted
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at java.base/sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
    at java.base/sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
    at java.base/java.security.cert.CertPathBuilder.build(CertPathBuilder.java:297)
    at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:380)
    ... 36 common frames omitted

When I have a look at Enmasse logging, I'm getting following logging in one of the qdrouters:

2019-07-09 14:52:07.293404 +0000 ROUTER (info) [C326] Connection Opened: dir=in host=10.128.1.175:53936<http://10.128.1.175:53936> vhost= encrypted=TLSv1.2 auth=EXTERNAL user=CN=admin.aloxy,O=io.enmasse container_id=standard-controller props={:product="vertx-proton", :version="3.6.3"}
2019-07-09 14:52:07.293478 +0000 ROUTER (info) [C326][L536] Link attached: dir=in source={<none> expire:sess outcomes:@PN_SYMBOL[:"amqp:accepted:list", :"amqp:rejected:list", :"amqp:released:list", :"amqp:modified:list"]} target={$management expire:sess}
2019-07-09 14:52:07.293919 +0000 ROUTER (info) [C326][L537] Link attached: dir=out source={<dynamic> expire:sess} target={<none> expire:sess}
2019-07-09 14:52:07.299307 +0000 ROUTER (info) [C326][L536] Link closed due to connection loss: del=4 presett=0 psdrop=0 acc=4 rej=0 rel=0 mod=0 delay1=0 delay10=0
2019-07-09 14:52:07.299336 +0000 ROUTER (info) [C326][L537] Link closed due to connection loss: del=4 presett=4 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0
2019-07-09 14:52:07.299346 +0000 ROUTER (info) [C326] Connection Closed


Also I'm unable to browse the console:
[cid:part9.7555DF8B.A4E5F894 at aloxy.io]
I've tested with all other routes as well, none of them are available:
[cid:part10.A5044E61.41C3F047 at aloxy.io]

I've tried to get rid of the custom certificates attached to the routes, but this didn't work.

In our installation we also have Eclipse ditto running. Ditto is also not able to connect to Enmasse.

Does anyone of you have a clue of what's going on? Have I missed something?

Kind regards,

--

[cid:part11.41D10F70.7ECC8B9E at aloxy.io] Bob Claerhout
Software developer
M +32 479 34 84 92
The Beacon, Sint-Pietersvliet 7, 2000
Antwerp
bob.claerhout at aloxy.io<mailto:bob.claerhout at aloxy.io> | www.aloxy.io<http://www.aloxy.io>
_______________________________________________
enmasse mailing list
enmasse at redhat.com<mailto:enmasse at redhat.com>
https://www.redhat.com/mailman/listinfo/enmasse


--
Jens Reimann
Principal Software Engineer / EMEA ENG Middleware
Werner-von-Siemens-Ring 14
85630 Grasbrunn
Germany
phone: +49 89 2050 71286
_____________________________________________________________________________

Red Hat GmbH, www.de.redhat.com<http://www.de.redhat.com>,
Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Paul Argiry, Charles Cachera, Tom Savage, Michael O'Neill



--
Jens Reimann
Principal Software Engineer / EMEA ENG Middleware
Werner-von-Siemens-Ring 14
85630 Grasbrunn
Germany
phone: +49 89 2050 71286
_____________________________________________________________________________

Red Hat GmbH, www.de.redhat.com<http://www.de.redhat.com>,
Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Paul Argiry, Charles Cachera, Tom Savage, Michael O'Neill

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/enmasse/attachments/20190710/edd56752/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: icmkeokoohinpphk.png
Type: image/png
Size: 42252 bytes
Desc: icmkeokoohinpphk.png
URL: <http://listman.redhat.com/archives/enmasse/attachments/20190710/edd56752/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dngdinjkdhlmpihp.png
Type: image/png
Size: 49739 bytes
Desc: dngdinjkdhlmpihp.png
URL: <http://listman.redhat.com/archives/enmasse/attachments/20190710/edd56752/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: koilllhhakloomkn.png
Type: image/png
Size: 6330 bytes
Desc: koilllhhakloomkn.png
URL: <http://listman.redhat.com/archives/enmasse/attachments/20190710/edd56752/attachment-0002.png>


More information about the enmasse mailing list