[dm-devel] [PATCH] libmultipath: unset detect_checker for clariion / Unity arrays

Martin Wilck martin.wilck at suse.com
Wed Jun 8 16:47:37 UTC 2022


On Wed, 2022-06-08 at 09:40 -0500, Benjamin Marzinski wrote:
> On Wed, Jun 08, 2022 at 07:56:27AM +0000, Martin Wilck wrote:
> > On Tue, 2022-06-07 at 17:45 -0500, Benjamin Marzinski wrote:
> > > Dell EMC would like to always use the emc_clariion checker.
> > > Currently
> > > detect_checker will switch the checker to TUR for Unity arrays.
> > > This can cause problems on some setups with replicated Unity
> > > LUNs,
> > > which are handled correctly the the emc_checker, but not the TUR
> > > checker.
> > > 
> > > Cc: vincent.chen1 at dell.com
> > > Signed-off-by: Benjamin Marzinski <bmarzins at redhat.com>
> > 
> > This points us to a flaw in our logic.
> > 
> > It was wrong in the first place to have autodetection take
> > precedence,
> > even over "overrides". The effect for users is that whenever
> > "path_checker" is set, "detect_checker no" must also be set, which
> > is highly surprising and adds no benefit.
> > 
> > We should assume that if a device has an explicit checker
> > configured
> > either in the device configuration, overrides, or the hwtable, this
> > checker must be used, and fall back to autodetection only if this
> > is
> > not the case.
> > 
> > So while this patch is alright, I'd prefer a patch that fixes the
> > logic.
> 
> I'm not sure that this is a good idea. IIRC, the current method was
> an
> intentional choice.  The idea was that if a checker was autodetected,
> we
> would use that. If not, we would fall back to the user defined
> checker.
> For the most part this is useful for arrays that could be run in ALUA
> or
> non-alua mode.  Changing the priority of detect_checker will suddenly
> change how these arrays are configured.  Users that configured a
> failback checker for cases where their arrays weren't in ALUA mode
> would
> suddenly find that their arrays were always using the fallback
> method. 

> Now I'm not saying that the original logic was the best option. But I
> am
> saying that it hasn't caused many issues over the years that it's
> been
> in existance. There are a number of arrays in the builtin hardware
> table
> that define checkers. I assume that they either don't support ALUA,
> or
> they are happy with only using their configured checker when
> their<Thinking about it, 
> arrays aren't in ALUA mode.

Most of the built-in hwtable entries set the RDAC checker. For these,
the result will be unchanged if we lower the precedence of
detect_checker. There are two entries setting DIRECTIO for non-SCSI
devices (DASD and Huawei NVMe): no regression risk there. Then there's
Clariion CX/AX/VNX, for which the change in behavior is intended.
Finally, there are two entries for ancient HPE devices using HP_SW. If
we change the precedence, these devices might switch from TUR to HP_SW
(if they support ALUA, dunno).

>  I would rather fix the remaining cases where
> the existing priority gives the wrong answer than suddenly change how
> things work, in a way that could suddenly break things for an unknown
> (but quite possibly large) number of existing users.

According to the above, only people who'd use very old HPE storage
arrays with latest upstream multipath-tools would be affected.

Remain those users that you mentioned — people who operate in ALUA mode
but keep some explicit multipath.conf settings around as fallback "when
their arrays aren't in ALUA mode" (for whatever reason). These users
would now observe that their arrays operate in fallback mode, even if
ALUA was enabled.

I'm not sure if this matters much. OTOH, it's rather common for people
to forget to set "detect_checker no" and wonder why their
multipath.conf settings don't take effect.

> 
> > (The same could be said for detect_prio, but I don't want to make
> > outrageous demands).
> 
> The above arguments apply here, only moreso.

Besides the devices already mentioned for detect_checker, there's the
Hitachi Vantara AMS setting PRIO_HDS, and ONTAP using PRIO_ONTAP. The
big difference is that in detect_prio(), RDAC arrays are configured to
use ALUA rather than RDAC. So if we switch the precedence, all those
arrays would switch from ALUA to RDAC. I agree this would be surprising
and undesirable. Are these arrays still configurable to not support
ALUA? If no, we could just remove the RDAC entries. Same for ONTAP.

Anyway, _if_ we change the precedence rules, we should do it for both
detect_prio and detect_checker, otherwise our behavior might appear
more inconsistent than it's now.

For now, I'll ack your patch and let this discussion sink in a bit.
Comments from users welcome!

Regards
Martin



More information about the dm-devel mailing list