[dm-devel] [PATCH 3/4] libmultipath: path latency: simplify getprio()

Thu Nov 23 02:20:48 UTC 2017

Hi Martin，

Thanks for your clarification. I agree with you now in this thread.

Regards,
Guan

On 2017/11/20 10:10, Martin wrote:
> 
> Hello Guan,
> 
>> On 2017/11/18 8:11, Martin Wilck wrote:
>>> The log standard deviation can be calculated much more simply
>>> by realizing
>>>
>>>    sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2
>>>
>>
>> I derive the equation:
>>  sum_n {(x_i - avg(x))^2} = sum_n{x_i^2 -2*x_i*avg(x) + avg(x)^2}
>>                           = sum_n{x_i^2} - 2*avg(x)*sum_n{x_i} +
>> sum_n{avg(x)^2}
>>                           = sum_n{x_i^2} - 2*avg(x)*avg(x) +
>> n*avg(x)^2
>>                           =  sum_n{x_i^2} + (n-2)*avg(x)^2
> 
> No, that's wrong:
> 
>     avg(x) = (1/n) * sum_n(x_i)
> =>  sum_n(x_i) = n * avg(x)
> 
> Thus the 2nd term in the line before the last in your derivation
> is not "- 2*avg(x)*avg(x)", but "- 2*n*avg(x)*avg(x)", and the end
> result becomes sum_n(x_i^2) - n*avg(x)^2.
> >>
>>> Also, use timespecsub rather than the custom timeval_to_usec,
>>> and avoid taking log(0).
>>>
>>
>> Great.
>>
>>
>>> +	pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f
>>> prio=%d\n",
>>
>> latency avg -> latency geometric avg ? Because in most cases,
>> avg means arithmetic avg , but in this case, it means geometric avg.
> 
> Yes, I meant the geometric average. I don't think we should bother the
> user with these subtleties. Well, maybe it would feel better if we'd
> use "geometric mean" rather than "avg" in the log message, alhough that
> might again irritate some people who don't know the term ... I really
> don't care much.
> 
>>> +		  pp->dev, exp(lg_avglatency * lg_base),
>>> +		  exp(standard_deviation * lg_base), rc);
>>
>> How can you get the uncertainty of Log-normal distribution
>> is the exp(standard_deviation * lg_base) ?
> 
> The "width" of the normal distribution is measured in terms of the
> standard deviation, sigma. In your patch, you correctly accounted for
> the confidence levels of the 2*sigma environment 
> (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule).
> 
> Here, we're assuming a log-normal distribution for the latency (it's a
> practical assumption, not a statistical assertion - in reality the
> latency probably rather follows an exponential or Poisson distribution
> but we don't need to go into that detail). That means we're assuming
> that log(latency) can be described by a normal distribution with a
> certain standard deviation sigma around the log of the geometric mean.
> Again, sigma is the "width" of that normal distribution. Thus with ~68%
> probability, the log of the the latency is in the 1-sigma interval
> around the average. Translating that back into "real" latency, with 68%
> likelyhood it will be in the interval [(1/F) * gm, F*gm], where gm is
> the geometric mean and F=exp(sigma). Therefore, F (which is
> exp(standard_deviation * lg_base)) can be used as an estimate of the
> "uncertainty factor" for the latency.
> 
> Agreed?
> 
> Regards
> Martin
>