[dm-devel] [PATCH 04/19] Revert "multipath-tools: discard san_path_err_XXX feature"
Muneendra Kumar M
muneendra.kumar at broadcom.com
Fri Dec 28 12:19:17 UTC 2018
Hi Martin,
Please find my replies below.
>Hi Muneedra,
> The san_path_err_XX feature was added by me and pushed to the
> upstream.
> And this feature was driven from Brocade Customer Feedback.
>
> And the below link will give the history of this where couple of
> discussions went before we started this feature.
>
> https://www.redhat.com/archives/dm-devel/2017-January/msg00025.html
>I'm aware that you authored the feature. I was not aware of that post you
>quoted, thanks for the link. Anyway, you mentioned in that post that the
>interested customers were using RHEL, have you made them upgrade their
>multipath-tools to >recent upstream to use the san_path_err and/or
>marginal_path features?
>>>> I will get back to u with the details.
> Our requirement was simple
> For example If there are two paths on a dm-1 say sda and sdb as below.
>
> # multipath -ll
> mpathd (3600110d001ee7f0102050001cc0b6751) dm-1 SANBlaze,VLUN MyLun
> size=8.0M features='0' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=50 status=active
> |- 8:0:1:0 sda 8:48 active ready running
> `- 9:0:1:0 sdb 8:64 active ready running
>
> And on sda if iam seeing lot of errors due to which the sda path is
> fluctuating from failed state to active state and vicevera.
>
> The requirement was something like this if sda is failed(moved from
> active to failed state) for more than X times in a Y duration ,then I
> want to keep the sda in failed state for Z duration
>Thanks for clarifying what you meant with "is failed". I'd been wondering
>if it meant "good"->"failed" transitions, as you just confirmed, or overall
>"failed" state count.
> And the data should travel only through sdb path for Z hrs.
>
>
> From the configuration point of view
>
> san_path_err_threshold: The number of times the sda has been moved
> from active to failed (from the above example it is X)
> san_path_err_forget_rate: Watch window (within this time frame if the
> path failures (sda moving from active to failed ) are more than err
> threshold then don't reinstate the path) (from the above example it is
> Y)
>The "watch window" analogy fits if you have a stable path (no or only very
>rare failures over extended periods of time) which suddenly starts
>fluctuating. More precisely, a "background" failure rate clearly below
>"san_path_err_forget_rate", >interchanging with problematic periods in
>which the failure rate is significantly higher than
>"san_path_err_forget_rate". And that's is the situation the algorithm was
>made for, right?
>In general, the "time" (in ticks) to reach the treshold is
>t = T / max(1/R - 1/F, 0)
>Where T is san_path_err_threshold, R is the average time (in ticks) between
>"good"->"failed" transitions of the path, and F is san_path_err_forget_rate
>(aka the time in ticks after which "path_failures" is decremented by 1).
>If R >= F, t is infinite; the "path_failures" count effectively stays 0. If
>R is much smaller than F, t ~ T * R. If R is only a little bit smaller than
>F, t is finite but (possibly much) larger than T * R.
>That's why I sloppily called F the "maximum tolerable failure rate" in my
>previous post.
>>>> Yes.
......
Regards,
Muneendra.
More information about the dm-devel
mailing list