[dm-devel] [PATCH 04/19] Revert "multipath-tools: discard san_path_err_XXX feature"

Martin Wilck mwilck at suse.com
Thu Dec 20 21:26:19 UTC 2018


Hello Muneedra,

On Thu, 2018-12-20 at 16:11 +0530, Muneendra Kumar M wrote:
> Hi Martin,
> I completely agree with you as we cannot derive a direct formula
> behind
> these two unless we don't know the IOPS on a particular path.
>
> As the IOPS in both the cases are different during the detection of
> Shaky
> path.
> In marginal_path_XX case the IOPS are fixed i.e 100 (at a sample rate
> of
> 10HZ) ,Similarly in san_path_xx case the IOPS are not fixed(as it
> depends on
> the application).
>
> But there are lot of ways to derive the IOPS on a particular path if
> we can
> get that then we can derive the values  like below IMO.
>
> And to calculate these we need to derive error threshold as the
> percentage
> of IOPS and the percentage should not be less than 1(as most of the
> Brocade
> SAN customers are using this configuration).
> i.e  san_path_errr_threshold and
> marginal_path_err_rate_threshold   needs to
> be computed as percentage of  IOPS for a given number of secs(derived
> from
> san_path_err_forget_rate/ marginal_path_err_sample_time).

You make me curious - are Brocade customers using our upstream
multipath code? Do you have insights about if, and how, they apply
marginal path checking in multipath-tools, and what parameter values
they are applying?

If yes, it would be very valuable for the community if you could share
some of these insights. So far I'm gathering that you recommend to
consider paths as shaky if they have an error rate of more than 1%.

>
> For example if  1000 IOPS are happening on a particular path and
> making the
> percentage factor as 1 and sample time as 60 secs the configuration
> will be
> as below
>
>       san_path_err_threshold     =600 (1 percentage of 60*1000)
>       san_path_err_forget_rate   =60
>       san_path_err_recovery_time 100

Hm, I understand it differently. In the san_path_err model, if you have
an error rate of 1% and the settings above, IMO you will *never* reach
the threshold. The failure count will increase (on average) in 1/100
ticks, but it will decrease in 1/60 ticks, resulting in a negative
first derivative (more precisely, a stochastic process where the
overall trend goes towards 0, not upwards towards the threshold).

In the san_path_err model, the maximum tolerable failure rate is
basically the reciprocal of the san_path_err_forget_rate parameter. 

The error threshold as a different effect, acting rather as a "delay" 
until the algorithm really considers the path shaky. The closer the
failure rate to the forget rate, the longer it takes. For example, if
you have an error rate of 1/30 (3.3%), the failure count will increase
by one every 60 ticks (1/30-1/60 = 1/60), and it will take 60*600 =
36000 (!) ticks, or 10h at best, until the path is considered shaky.
OTOH, with an error rate of 10%, the threshold is reached in 7200
ticks, and at an error rate of 50%, in 1200s.

For you scenario, I'd use something like

   san_path_err_threshold 4
   san_path_err_forget_rate 100
   san_path_err_recovery_time 100 

At least that's how I understand the algorithm. Am I wrong?

Btw, are you aware that the san_path_err algorithm, at least in the
form that was merged upstream, only counts good->bad transitions?
Especially with high error rates, this is quite different from an
overall error rate (failures / overall I/Os), because several
subsequent failures are only counted as one.

>
> Now this user is supposed to migrate to marginal_path settings.
> (IOPS in this case is fixed to 100 during the shaky path detection)
>       marginal_path_err_rate_threshold   60 (1 percentage of 60*100)
>       marginal_path_err_sample_time      60
>       marginal_path_err_recheck_gap_time 100
>
>
>
> And in this case  san_path_err_forget_rate  should be same as
> marginal_path_err_sample_time    and
> san_path_err_recovery_time should be same as
> marginal_path_err_recheck_gap_time  .
> only the variable factor is san_path_err_threshold  and
> marginal_path_err_rate_threshold   which keeps changing based on the
> number
> of errors as a percentage of IOPS for a given number of secs.
>
> The only parameter that is extra in marginal case is
> marginal_path_double_failed_time   which we need to configure for
> suspecting
> a marginal path.

I don't think these parameters will have the behavior as the
san_path_err parameters above. Argument above.

Note that marginal_path_err_sample_time 60 is invalid (the marginal
path code requires at least 120s), and that the error threshold is
always given as a "permillage" (should be set to 10 for 1%).

>
> As we still see some merits in the san_path_XX approach as you
> mentioned
> earlier
> and we need both san_path_err_xx and marginal_path_err_xx  I am
> thinking of
> the below approach so that the customers can have the common
> configuration
> for both.
> From the functionality wise san_path_err_forget_rate  ,
> marginal_path_err_sample_time    and
> san_path_err_recovery_time ,marginal_path_err_recheck_gap_time  and
> san_path_err_threshold  , marginal_path_err_rate_threshold are same.
>
> So we can have the common configuration name as marginal_path_err_XX
> (parameters) for both approaches and the deriving factor should be
> marginal_path_double_failed_time   .
> If marginal_path_double_failed_time   is not  defined go with
> san_path_err
> approach else go with marginal_path_err approach to detect the Shaky
> path.

I'm not sure about that. It's important that users are able to
understand the effect that each parameter has. If we use the same
parameter name for different parameters of different algorithms, even
bigger confusion might arise than we have now.
"san_path_err_recovery_time" and "marginal_path_recheck_gap_time" 
obviously have very similar effects, but for the other parameters I
don't see 1:1 equivalence.

Best regards,
Martin

-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)





More information about the dm-devel mailing list