[EnMasse] Dispatch router metrics to hawkular

Mon Mar 27 09:12:49 UTC 2017

On 27/03/17 08:46, Ulf Lilleengen wrote:
> Hi,
>
> I have been working on exporting metrics from EnMasse, and have this
> working for Artemis (using jolokia). I would also like to export metrics
> from the qpid dispatch router.
>
> Exposing metrics to the hawkular agent is either done using jolokia,
> prometheus or custom JSON. Jolokia is mostly relevant for java
> components, and custom JSON is ... custom.
>
> Prometheus has a nice python library that can be used for exposing
> metrics over an http interface that will be polled every X seconds by
> the hawkular agent (running on all hosts). So here are a few
> alternatives for exposing the metrics:
>
> a) A component running in the admin pod that will export metrics for all
> routers (using AMQP management to collect them from all routers)
> b) A component running alongside each router that will report metrics
> for that router only (using AMQP management to collect them)
> c) As alternative b), but as a compile-time plugin for the router
>
> I think a) could be useful for both reporting metrics and as an API for
> the enmasse console. The disadvantage I think would be that metric
> collection depends on a single component that could limit the
> scalability in a large network and possibly impact the other AMQP
> traffic. It would also be working differently from broker metrics
> reporting and (likely) other enmasse components.
>
> b) is probably the quickest way to get something working. It only
> involves talking over the local interface, and leaves scalability
> concerns of metric collection to hawkular. The disadvantage is that we
> have to run an extra container (although very light-weight) in each
> router pod.
>
> I think c) would benefit additional users deploying the dispatch router
> themselves and that wants to use it with prometheus-compatible
> monitoring tools. At the same time I have a hunch that having a http
> interface integrated like that is controversial in AMQP-land :)

I don't see any controversy in having an HTTP interface for metrics 
collection via a prometheus style APIs.

The console server currently follows the pattern of (a) but without 
storing any of the data, i.e. it connects to one router in the network, 
but retrieves management data for all the routers in the network. (It 
does this by polling at present. There is a plan for the router to 
support publishing of updates, which would reduce some of the redundant 
traffic.)

I'm not sure I see value in a component doing (a) that is separate from 
the console server (there could be a replacement for the current version 
of course).

The schema/data-model for the metrics is also worth thinking about a 
little. The prometheus data model is very simple, whereas the router 
management model is more (pseudo) object oriented.

For example, the number of deliveries is recorded per link. From that we 
can aggregate to get a per router or network wide total or aggregate by 
address or connection etc. What would we record in hawkular? Only 
address specific aggregations? Or would we consider per-connection 
metrics or even per-link metrics? If so, how would we keep track of all 
the connections & links? Can we query by a metric name pattern? As the 
number of connections and links could grow indefinitely, is hawkular 
intended to be used for that type of data? Could we easily discard 
metrics that have not been updated for some period of time?