[EnMasse] Dispatch router metrics to hawkular

Mon Mar 27 09:33:16 UTC 2017

On 27. mars 2017 11:12, Gordon Sim wrote:
> On 27/03/17 08:46, Ulf Lilleengen wrote:
>> Hi,
>>
>> I have been working on exporting metrics from EnMasse, and have this
>> working for Artemis (using jolokia). I would also like to export metrics
>> from the qpid dispatch router.
>>
>> Exposing metrics to the hawkular agent is either done using jolokia,
>> prometheus or custom JSON. Jolokia is mostly relevant for java
>> components, and custom JSON is ... custom.
>>
>> Prometheus has a nice python library that can be used for exposing
>> metrics over an http interface that will be polled every X seconds by
>> the hawkular agent (running on all hosts). So here are a few
>> alternatives for exposing the metrics:
>>
>> a) A component running in the admin pod that will export metrics for all
>> routers (using AMQP management to collect them from all routers)
>> b) A component running alongside each router that will report metrics
>> for that router only (using AMQP management to collect them)
>> c) As alternative b), but as a compile-time plugin for the router
>>
>> I think a) could be useful for both reporting metrics and as an API for
>> the enmasse console. The disadvantage I think would be that metric
>> collection depends on a single component that could limit the
>> scalability in a large network and possibly impact the other AMQP
>> traffic. It would also be working differently from broker metrics
>> reporting and (likely) other enmasse components.
>>
>> b) is probably the quickest way to get something working. It only
>> involves talking over the local interface, and leaves scalability
>> concerns of metric collection to hawkular. The disadvantage is that we
>> have to run an extra container (although very light-weight) in each
>> router pod.
>>
>> I think c) would benefit additional users deploying the dispatch router
>> themselves and that wants to use it with prometheus-compatible
>> monitoring tools. At the same time I have a hunch that having a http
>> interface integrated like that is controversial in AMQP-land :)
>
> I don't see any controversy in having an HTTP interface for metrics
> collection via a prometheus style APIs.
>
> The console server currently follows the pattern of (a) but without
> storing any of the data, i.e. it connects to one router in the network,
> but retrieves management data for all the routers in the network. (It
> does this by polling at present. There is a plan for the router to
> support publishing of updates, which would reduce some of the redundant
> traffic.)
>
> I'm not sure I see value in a component doing (a) that is separate from
> the console server (there could be a replacement for the current version
> of course).
>

Yes, and the use case is a bit different in that you probably collect 
fewer metrics to show in a console, whereas the hawkular metrics would 
proably contain a lot more detailed info that might not be interesting 
to just the end user but also the messaging operators.

> The schema/data-model for the metrics is also worth thinking about a
> little. The prometheus data model is very simple, whereas the router
> management model is more (pseudo) object oriented.
>
> For example, the number of deliveries is recorded per link. From that we
> can aggregate to get a per router or network wide total or aggregate by
> address or connection etc. What would we record in hawkular? Only
> address specific aggregations? Or would we consider per-connection
> metrics or even per-link metrics? If so, how would we keep track of all
> the connections & links? Can we query by a metric name pattern? As the
> number of connections and links could grow indefinitely, is hawkular
> intended to be used for that type of data? Could we easily discard
> metrics that have not been updated for some period of time?
>

I think the general approach with hawkular (and similar tools i've used 
in the past) is that components reports non-aggregated values that are 
tagged so that the various dashboards can aggregate as they wish based 
on the tags. For instance, for the broker, I report the messageCount 
metric from all queues, and it is tagged with broker name, queue name 
and address. Then you can created dashboards that display the metric 
aggregated accross multiple brokers, multiple queues or not depending on 
what you want.

So with the router I would collect per-link metrics and tag them with 
i.e. connection id, link id, target address, source address.

If a metric cannot be collected, you can define how it will be interpreted.

-- 
Ulf