[dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order

Martin Wilck martin.wilck at suse.com
Tue Jun 6 16:32:13 UTC 2023


On Tue, 2023-06-06 at 10:54 -0500, Benjamin Marzinski wrote:
> On Tue, Jun 06, 2023 at 02:55:27PM +0000, Martin Wilck wrote:
> > On Mon, 2023-06-05 at 23:42 -0500, Benjamin Marzinski wrote:
> > > On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski

> > >  
> > > Actually, after looking into this more, pushing those two
> > > functions
> > > together makes the logic more confusing. Plus select_path_group()
> > > is
> > > used by multiple other functions that don't need to check if the
> > > path
> > > groups are out of order.
> > 
> > Hm. Can it happen at all that select_path_group() returns something
> > other than 1 but path_groups_in_order() returns true? 
> 
> Yes. It might even be the common case. Say a switch goes down and all
> the paths in the high priority pathgroup fail. The kernel will switch
> over to a lower priority pathgroup. As long as those paths work, it
> won't automatically switch back to the high priority pathgroup when
> we
> tell it that those failed paths have recovered. It's multipath's job
> to
> tell it when to proactively switch pathgroups. Since multipath
> doesn't
> update the priority of failed paths, the pathgroups should still look
> the same (unless you use group_by_prio and the path fails between
> checking the state and running the prioritizer, in which case you
> will
> likely get a PRIO_UNDEF and reconfigure the pathgroups, but that's
> the
> thing group_by_tpg is trying to resolve). 

Ok, this is subtle; it's caused by the fact that path_groups_in_order()
ignores the ordering of PGs with pgp->prio = PRIO_UNDEF (which will be
the prio of a PG with only failed paths), whereas select_path_group()
will ignore such PGs it in a different way - by never selecting them.
I hope I understand correctly now.

I have to say this is confusing. We have different concepts of how path
priority and path state together affect the PG priority, and we apply
these different concepts in different parts of the code. I'm not saying
it's wrong, but at the moment I'm too confused to tell if it's right.

>  
> > If we follow the mindset you layed out in your patch ("the kernel
> > treats the pathgroups as if they were ordered by priority")
> > consequently, just determining bestpg is not enough; we'd need to
> > sort
> > the PGs every time (except for a user-triggered switchgroup
> > command).
> > IMO whenever we reload the map anyway (e.g. in setup_map()) we
> > should
> > make sure that the PGs are properly sorted. Using "switch_group"
> > instead of a full reload is just an optimization because the kernel
> > operation is more light-weight than a full reload. But as soon as
> > we
> > have e.g. a marginal path group, reordering is probably a better
> > idea
> > most of the time.
> 
> We already do correctly order the paths in setup_map().
> setup_map() -> group_paths() -> sort_pathgroup().  Actually, looking
> at
> this, I don't see why we even bother to call select_path_group() in
> setup_map(). The answer will always be 1, since we just sorted them.
> 

Right. I suppose the call to select_path_groups() predates the one to
sort_pathgroups().

Martin



More information about the dm-devel mailing list