[dm-devel] Dealing with constantly failing paths

Benjamin Marzinski bmarzins at redhat.com
Thu Sep 13 17:45:25 UTC 2018


On Thu, Sep 13, 2018 at 12:42:54PM +0300, Özkan Göksu wrote:
>    Hello. 
>    I'm sorry to have e-mailed you here but I did not really find the answer.
>    When a disk starts to die slowly multipath starts to Failing & Reinstating
>    paths and this keeps forever.. (I'm using LSI-3008HBA card with SAS-JBOD
>    not FC-Network)
>    Because kernel do not echo to offline faulted disk. This is causing
>    terrible problems to me.
>    I'm using: multipath-tools 0.7.4-1
>    Linux DEV2 4.14.67-1-lts #1 
>    Dmesg;
>        Sep 13 11:20:17 DEV2 kernel: sd 0:0:190:0: attempting task abort!
>    scmd(ffff88110e632948)
>        Sep 13 11:20:17 DEV2 kernel: sd 0:0:190:0: [sdft] tag#3 CDB:
>    opcode=0x0 00 00 00 00 00 00
>        Sep 13 11:20:17 DEV2 kernel: scsi target0:0:190: handle(0x0037),
>    sas_address(0x5000c50093d4e7c6), phy(38)
>        Sep 13 11:20:17 DEV2 kernel: scsi target0:0:190:
>    enclosure_logical_id(0x500304800929ec7f), slot(37)
>        Sep 13 11:20:17 DEV2 kernel: scsi target0:0:190: enclosure
>    level(0x0001),connector name(1   )
>        Sep 13 11:20:17 DEV2 kernel: sd 0:0:190:0: task abort: SUCCESS
>    scmd(ffff88110e632948)
>        Sep 13 11:20:18 DEV2 kernel: device-mapper: multipath: Failing path
>    130:240.
>        Sep 13 11:25:34 DEV2 kernel: device-mapper: multipath: Reinstating
>    path 130:240.
>    Full dmesg example: [1]https://paste.ubuntu.com/p/H9NMWxNfgD/
>     
>    As you can see kernel aborted the mission and after that multipath failed.
>    So I want to get rid of this problem via telling Multipath "do not
>    Reinstate the path".  
>    This method will keep dead the zombie disk.
>    If I dont kick the disk out its causing HBA reset and I'm losing all disk
>    in my pool and ZFS pool suspending.
>    I'm not saying this problem related to multipathd, I'm just thinking this
>    will save me.
>    So how can I tell the multipath do not Reinstate X times failed path?
>    Thank you.

In recent releases there are two seperate methods to do this. Both of
them involve setting multiple multipath.conf parameters. The older
method is to set "delay_wait_checks" and "delay_watch_checks". The newer
one is to set "marginal_path_double_failed_time",
"marginal_path_err_sample_time", "marginal_path_err_rate_threshold", and
"marginal_path_err_recheck_gap_time". You can look in the multipath.conf
man page to see if both sets of options are available to you, and how
they work. If the version of the multipath tools you are using has both
sets of options, the "marginal_path_*" options should do a better job at
finding these marginal paths.

-Ben

> 
> References
> 
>    Visible links
>    1. https://paste.ubuntu.com/p/H9NMWxNfgD/

> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel




More information about the dm-devel mailing list