[dm-devel] [PATCH 1/2] libmultipath: hwhandler auto-detection for ALUA

Wed Apr 4 08:04:36 UTC 2018

On Tue, 2018-04-03 at 16:29 -0500, Benjamin Marzinski wrote:
> On Tue, Apr 03, 2018 at 10:53:29PM +0200, Martin Wilck wrote:
> > On Tue, 2018-04-03 at 15:31 -0500, Benjamin Marzinski wrote:
> > > On Tue, Mar 27, 2018 at 11:50:52PM +0200, Martin Wilck wrote:
> > > > If the hardware handler isn't explicitly set, infer ALUA
> > > > support
> > > > from the pp->tpgs attribute. Likewise, if ALUA is selected, but
> > > > not supported by the hardware, fall back to no hardware
> > > > handler.
> > > 
> > > Weren't you worried before about temporary ALUA failures? If you
> > > had
> > > a
> > > temporary failure while configuring a device that you explicitly
> > > set
> > > to
> > > be ALUA, then this would cause the device to be misconfigured? 
> > 
> > I believe that if TGPS is 0, the device will never be able to
> > support
> > ALUA. The kernel also looks at the TPGS bits and won't try ALUA if
> > they
> > are unset. Once the device is configured and actual ALUA RTPG/STPG
> > calls are performed, they may fail for a variety of temporary
> > reasons -
> > I wanted to avoid resetting the prio algorithm to "const" for such
> > cases. That's my understanding, correct me if I'm wrong.
> 
> Devices that were not correctly supporing ALUA returned > 0 for
> get_target_port_group_support, so detect_alua actually does all the
> work
> necessary to verify that it can get a priority. Without doing this,
> multiple deviecs that didn't support ALUA were being detected as
> supporting ALUA.

So, detect_alua() tests TPGS *and* tries and actual alua call, and sets
pp->tpgs to anything other than TPGS_NONE only if the latter is
successful. That's fine. My patch was looking at pp->tpgs, so it was
implicitly using this logic of detect_alua(). But does that guarantee
that future alua->getprio() calls will never fail at some later point
in time?

Maybe I misunderstood your original proposition. What I'm saying is
that resetting the prio algorithm from "alua" to "const" because of an
error code in get_prio() is wrong, because that error code may be
transient.

If we give "hardware_handler" config options preference over ALUA
autodetection, and thus enforce hwhandler "1 alua" on such devices that
have no ALUA support, domap() is guaranteed to fail, because the kernel
refuses to set up a map with a given hwhandler if any device doesn't
support that handler.

> By using retain_attached_hwhandler at all, we are implicitly
> requiring
> the scsi_dh_alua module to be loaded before devices with
> indeterminate
> configurations are discovered for them to work correctly. right? For
> instance, commit 715c48d93dd00930534ce6a55d0e3705466df5d6 did this
> for
> netapp devices, and that was in 2013. I don't see how this is
> different.

You're right, we are "implicitly requiring" this sort-of, but we have
no code that enforces the early loading of the device handlers. We
should be shipping a modules-load.d file, or a modprobe.d softdep, or
something similar that would enforce this setting if we _really_ depend
on it. "Implicit requirements" are bad. We should either make the
requirement explicit, or not hard-depend on it. So far I was thinking
the latter. After all, SCSI device-handler support is configurable in
the kernel.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)