[dm-devel] [PATCH 00/19] san_path_err & multipath ANA support

Martin Wilck mwilck at suse.com
Mon Jan 7 11:21:55 UTC 2019


On Fri, 2018-12-21 at 10:06 -0600, Benjamin Marzinski wrote:
> 
> I've been thinking about how we handle marginal paths, and it seems
> to
> me that instead of telling the kernel that they have failed, it might
> be
> better to create pathgroups of last resort, which contains marginal
> paths that should only be used if all the other paths are down.

Maybe we should simply assign marginal paths a very low priority? 

At least with "group_by_prio" and immediate failback, that would cause
multipathd to switch to these paths if nothing else is available, and
switch back ASAP - so it would give you the desired behavior almost at
no cost. An open question for me is whether this priority should be
higher or lower than what we assign to "ghost" paths ins standby state
(1, currently).

Side note: the global "failback" policy setting may not fit the needs
of all modern setups. I think that immediate failback is always correct
for "marginal" vs. flawless paths, but we know that it's not always
wanted for non-optimal vs. optimal paths, or other failback scenarios.

> 
> The downsides to this method are that it is quite possible that it
> could
> double the number of pathgroups whenever you have connection issues,
> since a connection issue near the host HBA could cause a marginal
> path
> in each pathgroup. This means more reloading tables, and more
> confusing
> layouts.
> 
> The upside to this method is that multipath won't run out of paths
> while
> their are still marginal paths that it could use. When queuing isn't
> enabled, there's nothing to stop the kernel from failing IO while
> potentially usable marginal paths exist.
> 
> On the other hand, this problem could be mitigated by having
> multipath
> work such that, when marginal path detection is configured, it always
> makes sure that no_path_retry is at least some minimum value that we
> believe is long enough for multipathd to be notified of the path
> failure
> by the kernel and to reinstate the marginal paths.

I'd rather simply document that we discourage "no_path_retry = fail"
while marginall path detection is enabled. "long enough" sounds like a
can of worms to me.

Martin

-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)





More information about the dm-devel mailing list