[EnMasse] Dispatch router metrics to hawkular

Tue Mar 28 11:33:09 UTC 2017

On 28/03/17 11:05, Ulf Lilleengen wrote:
> On 28. mars 2017 11:32, Gordon Sim wrote:
>> On 28/03/17 10:07, Ulf Lilleengen wrote:
>>> I think the type of information suitable for metrics are information
>>> that will be useful for the operations team that are monitoring the
>>> availability and performance of the system. IMO logging is a better
>>> alternative for metrics that are mainly used for debugging, and I think
>>> (though i've never operated messaging infrastructure..) what you refer
>>> to as potentially useful information is mainly interesting for
>>> debugging.
>>
>> As an example, imagine if the statistics show bursts in the frequency of
>> message rejections on occasion and this is coming largely from a single
>> client. Being able to drill into the data and get the user/ip of the
>> client in question would be useful.
>>
>> So, yes, on one level it is 'debugging' of a sort. I think of it more as
>> understanding how the system is being used. The monitoring console as I
>> see it is for this sort of general purpose troubleshooting, observation.
>> It does cover performance, but isn't limited to that.
>>
>
> If you compare this to a typical HTTP server, details of a request would
> typically end up in an access that would also be stored in a central
> logging facility like logstash for later debugging or post-mortem analysis.

Right, and we can do the same (just need to ensure that the log output 
of the different components allows us to correlate things as we want 
to). I believe there is already a semi-standard stack for openshift for 
doing this, so we can experiment with that as well.

> From a http server perspective I think a nice way to distinguish metrics
> from logs is how many values a dimension/tag may have. For instance, the
> value range the request type is small (GET, PUT etc.), while the value
> range of client IPs can be very big. Graphs with many values per tag
> doesn't look nice at all, and creating one graph per value is tedious.
>
> On the other hand, I guess in most messaging use cases, connections are
> long lived and you may only have a few known clients, so maybe having
> host /container id as a tag would work just fine. Lets try it out!

In terms of the value range, the ip address is most likely no wider than 
any other connection identifier (including the artificial one added by 
the router).

Number of connections is one of the dimensions we really do want to 
scale in. So if having a large range of values for the tags may be 
problematic, we may want to avoid per-link and per-connection stats and 
do some simple aggregation *before* storing.