[EnMasse] Enmasse + UnifiedPush + Operator Feedback and request for help

Mon Jul 29 15:07:10 UTC 2019

I'm cross posting this on the AeroGear(
https://groups.google.com/forum/#!forum/aerogear) and the enmasse (
https://www.redhat.com/mailman/listinfo/enmasse) mailing lists.

In the past few months we've been adding the ability to use an external
broker with the Unified Push Server[UPS] (
https://github.com/aerogear/aerogear-unifiedpush-server).  We've added
support for external AMQP connections to the UPS container image, as well
as added support for connecting to enmasse using the UPS operator (
https://github.com/aerogear/unifiedpush-operator).  Following are our
experiences with enmasse as well as some problems we encountered.

1: The creation of enmasse resources using the UPS operator (based on
client-go https://github.com/kubernetes/client-go) worked pretty much as
expected.  The only downside was that the golang bindings from enmasse were
out of date, but the team accepted our PR's and released a new version with
updated bindings.  Much appreciated!

2: The enmasse controllers for address, addresspaces, and messagingusers do
not implement k8s watches, nor do they throw an exception if you try to
watch a resource.  This causes a lot of errors to be logged in the UPS
operator as we need to watch and maintain those resources to keep our
service running.  There is an open enmasse issue here that describes the
issue : https://github.com/EnMasseProject/enmasse/issues/1280

3: Enmasse doesn't block owner deletion nor implement a finalizer to handle
being orphaned.  This means that when our operator deletes a UPS server,
the addresses and addressspace resources we create don't get deleted, nor
do we get a notification that they aren't being deleted.

4: Enmasse has no way to configure delivery failures.  Usually with a
message broker we want failed messages to be retried, then retried with a
backoff, and eventually retired to a dead letter queue.  In enmasse the
default behavior is to retry immediately and infinitely.  There is very
little we can use the broker for to get an alert that this is happening.
In the case of a production error this means that enmasse will death star
our service with an infinite stream of doomed messages.  We are researching
workarounds, but per this issue (
https://github.com/EnMasseProject/enmasse/issues/2927) it seems like there
is little we can do at the level of address resource configuration.

5: Creation of resources (addresses, addressspaces, messaging users) and
getting their respective information into our deployment using our operator
was really straight forward.  Like amazingly simple and directly straight
forward.

Feel free to reply inline to the points.  Now for my actual question :

We're trying to find workaround to #4 and are actively soliciting ideas.
Right now we're looking at implementing more aggressive messaging handling
so messages will always be consumed even if we would prefer them to be
retried (for instance if the network connection to our push service was
unavailable for a moment, a credential expired etc).

This doesn't help for unexpected errors in the UPS code (who doesn't love a
good NPE), and for that we might want to have the UPS operator keep an eye
out for extreme message re deliveries and auto-manually delete those
messages, but that would be better handled by a DLQ mechanism.

Thoughts on the potential workarounds?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/enmasse/attachments/20190729/1de3156c/attachment.htm>