[dm-devel] [PATCH 0/9] Add protocol specific config subsection

Benjamin Marzinski bmarzins at redhat.com
Tue Apr 12 18:47:28 UTC 2022


On Tue, Apr 12, 2022 at 10:31:50AM +0000, Martin Wilck wrote:
> On Mon, 2022-04-11 at 20:59 -0500, Benjamin Marzinski wrote:
> > Some storage arrays can be accessed using multiple protocols at the
> > same
> > time.  I've have customers request the ability to set different
> > values
> > for the path specific timeouts, like fast_io_fail_tmo, based on the
> > protocol used to access the path. In order to make this possible,
> > this
> > patchset adds a new protocol subsection to the device subsection and
> > the
> > overrides section. This allows users to set a device config's path
> > specific timeouts, such as dev_loss_tmo, fast_io_fail_tmo, and
> > eh_deadline on a per-protocol basis.
> 
> Sigh... I am not happy about this amount of additional complexity in
> the multipath configuration. It is already so complicated that hardly
> anyone really understands how it works.
> 
> I fully agree that some properties, in particular fast_io_fail_tmo [1],
> are rather properties of protocol or fabrics than a storage array.
> But do we really need to differentiate by both "device" and "protocol"?
> IOW, do users need different fast_io_fail_tmo value for the same
> protocol, but different arrays? My feeling is that making this property
> depend on a 2-D matrix of (protocol x storage) is overcomplicating
> matters. And _if_ this is necessary, what do we gain by adding an
> "overrides" on top? [2]

I agree that setting fast_io_fail_tmo to different values based on both
array and protocol is usually overkill. This is why I added it to the
overrides section as well. If you just put it there, it will work for
all devices. I assumed that would be the common case. It wouldn't make
sense to have it be in the defaults section and override things in
devices section, but it's a natural fit for the overrides section. Plus,
since I added the protocol table to the hwentry, enabling it in the
overrides section wasn't much new code.
 
> What about adding simply adding "protocols" as a new top section in the
> conf file, and have the per-protocol settings override built-in hwtable
> settings for any array, but not explicit storage-device settings in the
> conf file?

I'm not really enamored with the idea of only working on the built-in
hwtable. I feel like the people that want to tune things based on the
protocol are the same sort of people that want tune things per array.
More importantly, we don't have anything else in multipath that only
applies to some of the device config entries. That change seems more
confusing to me than adding a new subsection. The protocol subsection is
visually part of the device config it is modifying. A separate protocol
section that only modified some of the device configs seems less
obvious. Plus we don't have a good way to code that.  Also, what happens
to merged configs, where some of the timeouts came from a builtin
config, and some came from the user config.  If you are really against
the subsection idea, I would rather have the protocol section override
all the device configs. Users would need to be o.k. with picking a
protocol for which all their arrays have the same timeout values. But
again, that should be the common case.

> Regards,
> Martin
> 
> [1]: wrt dev_loss_tmo, I tend to think that the best value must be
> found based on neither protocol nor array, but data center staff.

I do agree that fast_io_fail_tmo and eh_deadline make more sense than
dev_loss_tmo (especially since FC/iSCSI setups seem the most common, and
iSCSI doesn't support dev_loss_tmo), but since the section is there, and
dev_loss_tmo is a per-path timeout setting, I put it in.  I thought
about letting people change the checker type per protocol, but figured
that could wait till someone asked for it. This would make less sense if
we had a seperate top level protocol section. So would things like
per-protocol manginal path handling, and other path specific things
which could reasonably go in a protocol subsection of a device config.

> [2]: I strongly dislike "overrides", in general. I wonder what we need
> it for, except for quick experiments where people are too lazy to add
> explicit settings for devices and/or multipaths.

The biggest reason is that some of the builtin device configs do things
like set no_path_retry to "queue". I know people have used it to
override dev_loss_tmo and fast_io_fail_tmo, but the big one is
no_path_retry. But in general, some builtin device configs make choices
that aren't applicable for all environments, but are what the vendors
have validated and want for the default.

While you can call it lazy, it's always possible that a new builtin
config will be added, or and existing one changed, and suddenly your
devices are hanging instead of failing like expected when all the paths
go down, because the devices are now configured with a different
no_path_retry value.  It's safer to simply disallow this possiblity.

Also, the overrides section exists so that it is possible to set up a
multipath config that will work within some environment's constraints,
regardless of the hardware that is attached. For instance, in
virtualized setups, no_path_retry "queue" can be really annoying.  So
virtualization solutions that use multipath can distribute a
multipath.conf that overrides these settings on all types of possible
devices, without having to go through and modify every problematic
builtin device entry, and update their config as new entries are added.


More information about the dm-devel mailing list