[dm-devel] [PATCH 0/4] dm-latency: Introduction

Thu Feb 26 19:45:43 UTC 2015

For my particular use case its about providing the ability to warn on latencies being seen for multipath devices based on a given threshold.
Of course this can simply be a userspace tool using what we already expose and do the calculations to make it work.
When we have these latencies we then focus on which SAN path may or may not be contributing.
Within multipathd we can already configure service time as a load balancer, perhaps we can do the monitoring in the same place.
i.e. warn on service time above a threshold.

When a customer says "I currently use the following xxxx for multipath  on RHEL however I want to go to native multipathing, but you don't provide the monitoring I need" I want to work to an enhancement.

For example I had Ben add the ability just recently to multipath to control re-establishing path usage based on path health when the path returns to mimic what DMP can do to avoid recovery due to path flapping. 

Thanks

Laurence Oberman
Red Hat Global Support Service
SEG Team

----- Original Message -----
From: "Mikulas Patocka" <mpatocka at redhat.com>
To: "Bryn M. Reeves" <bmr at redhat.com>
Cc: "device-mapper development" <dm-devel at redhat.com>, "Tao Ma" <boyu.mt at taobao.com>, "Robin Dong" <sanbai at alibaba-inc.com>, "Laurence Oberman" <loberman at redhat.com>, "Coly Li" <colyli at gmail.com>, "Alasdair Kergon" <agk at redhat.com>
Sent: Thursday, February 26, 2015 2:34:40 PM
Subject: Re: [dm-devel] [PATCH 0/4] dm-latency: Introduction

On Thu, 26 Feb 2015, Bryn M. Reeves wrote:

> On Thu, Feb 26, 2015 at 11:49:28AM -0500, Mikulas Patocka wrote:
> > We have already dm-statistics that counts various events - see 
> > Documentation/device-mapper/statistics.txt. It counts the nubmer of 
> > requests and the time spent servicing each request, thus you can 
> > calculate average latency from these values.
> 
> Right: average service time (as reported by iostat etc.) is easily derived
> from the existing stats.
> 
> Does the separate latency accounting buy anything additional?
>
> > Please look at dm-statistics to see if it fits your purpose. If you need 
> > additional information not provided by dm-statistics, it would be better 
> > to extend the statistics code rather than introduce new "latency" 
> > infrastructure.
> 
> Agreed; I'm working on userspace support for dm-statistics at the moment
> and if there is a need for these additional measurements I would greatly
> prefer to consume them as additional fields in the existing dm-stats
> counter set.
> 
> This also has the advantage of benefiting from the existing step and
> area support allowing a device to be subdivided into discrete stats
> regions.
> 
> Regards,
> Bryn.

Coly's paper (http://blog.coly.li/docs/osc13-coly.pdf) shows that they 
take histogram of latencies and use it to predict disk failure.

That could be easily added to dm-statistics.

Average latency alone can't be used to predict disk failure because 
average latency depends on the type of workload (for example - sequantial 
or nearly sequential requests have much lower latency than random 
requests).

I'd like to know if we need separate histogram per region, or if it is 
sufficient to have a histogram per device. dm-latency has no regions, it 
has a histogram for the whole device. The histogram-per-region would 
consume more memory, I'm interested if there is some reasonable use case 
for that.

Mikulas