[dm-devel] multipathd ignoring dev_loss_tmo setting

Martin Wilck mwilck at suse.de
Mon Mar 4 12:09:45 UTC 2019


On Thu, 2019-02-28 at 11:38 +0000,  Martins, Bruno O wrote:
> Hello guys,
> 
> I am trying to modify /etc/multipath.conf on my system so that the
> parameter 'dev_loss_tmo' is changed from the default value.
> 
> My multipath.conf file contains the following:
> 
> defaults {
>         verbosity 2
>         polling_interval 5
>         max_polling_interval 10
>         multipath_dir "/lib64/multipath"
>         path_selector "round-robin 0"
>         path_grouping_policy "failover"
>         uid_attribute "ID_SERIAL"
>         prio "const"
>         prio_args ""
>         features "0"
>         path_checker "directio"
>         alias_prefix "mpath"
>         failback "manual"
>         rr_min_io 1000
>         rr_min_io_rq 1
>         max_fds "max"
>         rr_weight "uniform"
>         no_path_retry "fail"
>         queue_without_daemon "no"
>         checker_timeout 15
>         flush_on_last_del "no"
>         user_friendly_names "yes"
>         fast_io_fail_tmo 5
>         dev_loss_tmo 10
>         bindings_file "/etc/multipath/bindings"
>         wwids_file /etc/multipath/wwids
>         log_checker_err always
>         retain_attached_hw_handler no
>         detect_prio no
> }
> 
> However, when checking the value currently in use I am getting the
> wrong value (which is '30') for some of the remote ports:
> 
> for f in /sys/class/fc_remote_ports/rport-*/dev_loss_tmo; do
> d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f);
> done
> 
> rport-3:0-0:0x5742b0f00007c500:10
> rport-3:0-1:0x5742b0f00007c500:10
> rport-3:0-2:0x5742b0f00007c500:10
> rport-3:0-3:0x5000097408369800:30
> rport-3:0-4:0x500009757804cbff:30
> rport-4:0-0:0x5742b0f00007c500:10
> rport-4:0-1:0x5742b0f00007c500:10
> rport-4:0-2:0x5000097408369800:30
> rport-4:0-3:0x5742b0f00007c500:10
> rport-4:0-4:0x500009757804cbff:30
> rport-5:0-0:0x5742b0f00007c500:10
> rport-5:0-1:0x5742b0f00007c500:10
> rport-5:0-2:0x5742b0f00007c500:10
> rport-5:0-3:0x5000097408369800:30
> rport-5:0-4:0x500009757804cbff:30
> rport-6:0-0:0x5742b0f00007c500:10
> rport-6:0-1:0x5742b0f00007c500:10
> rport-6:0-2:0x5000097408369800:30
> rport-6:0-3:0x5742b0f00007c500:10
> rport-6:0-4:0x500009757804cbff:30
> 
> systool is giving me the same information:
> 
> systool -c fc_remote_ports -v | grep dev_loss_tmo
> 
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "10"
> 
> 
> > 
> > I am using the following versions:
> > 
> > rpm -qa multipath-tools
> > multipath-tools-0.4.9-109.1
> > 
> > uname -a
> > Linux mysystem 3.0.101-63-default #1 SMP Tue Jun 23 16:02:31 UTC
> 2015
> > (4b89d0c) x86_64 x86_64 x86_64 GNU/Linux
> > 
> > Thanks for your help!
> > 
> > Kind regards,
> > 
> > Bruno
> > 
> > --
> > dm-devel mailing list
> > dm-devel at redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> 
> 
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "30"
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "30"
>     dev_loss_tmo        = "30"
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "30"
>     dev_loss_tmo        = "10"
>     dev_loss_tmo        = "30"
>     dev_loss_tmo        = "30"
>     dev_loss_tmo        = "30"
>     dev_loss_tmo        = "30"
> 
> Where is this value coming from? May this be a bug? I couldn't find
> anything useful on the Internet regarding this.

It'd be very helpful if you could upload "multipath -v3" (or multipathd
with verbosity 3) logs somewhere.

It looks as if you're using some SLE11 variant, so maybe you want to
open a support case?

Another question would be why you want such a low dev_loss_tmo. It's
not generally recommended, because on the kernel side, removing and re-
adding a device is a lot more complex than disabling and re-enabling
it. The fast_io_fail_tmo should provide you with quick path failover
already. My recommendation is to set dev_loss_tmo to a value which
would, in the given data center, indicate that the device loss is
really not due to a temporary outage but due to a permantly removed
device (e.g. permanent storage configuration change). So basically, the
dev_loss_tmo shouldn't be shorter than the admin's lunch break.

Martin






More information about the dm-devel mailing list