[dm-devel] [PATCH v7 0/2] multipath-tools: intermittent IO error accounting to improve reliability

Guan Junxiong guanjunxiong at huawei.com
Mon Nov 6 13:04:32 UTC 2017


On 2017/11/6 20:15, Muneendra Kumar M wrote:
> Hi Guan,
> Any update on this patch ?
>> Regards,
> Muneendra.
> 

It's not yet merged. It's waiting for  Christophe's merging.
Hope Christophe can give any feedback soon.

BTW, your clients ( and my clients) can keep using this
patch until it is really merged into the mainline.

Please wait . I think Christophe will eventually pick up this patch.

Best wishes.
Guan


> -----Original Message-----
> From: Guan Junxiong [mailto:guanjunxiong at huawei.com] 
> Sent: Thursday, November 02, 2017 6:20 AM
> To: christophe.varoqui at opensvc.com
> Cc: dm-devel at redhat.com; Muneendra Kumar M <mmandala at Brocade.com>; mwilck at suse.com; shenhong09 at huawei.com; niuhaoxin at huawei.com
> Subject: Re: [PATCH v7 0/2] multipath-tools: intermittent IO error accounting to improve reliability
> 
> Dear Christophe,
> 
> Could you please consider applying this patch or give any feedback about it?
> We (Huawei and Brocade) are looking forward to you reply.
> Thanks.
> 
> Regards
> Guan Junxiong
> 
> .
> 
> 
> On 2017/10/24 9:57, Guan Junxiong wrote:
>> Hi Christophe and All,
>>
>> This patch set adds a new method of path state checking based on 
>> accounting IO error. This is useful in many scenarios such as 
>> intermittent IO error on a path due to intermittent frame drops, 
>> intermittent corruptions, network congestion or a shaky link.
>>
>> This patch set is of significance because of this (quoted from the 
>> discussion with Muneendra, Brocade):
>>
>> There are typically two type of SAN network problems that are 
>> categorized as marginal issues. These issues by nature are not 
>> permanent in time and do come and go away over time.
>> 1) Switches in the SAN can have intermittent frame drops or intermittent
>>    frame corruptions due to bad optics cable (SFP) or any such wear/tear port
>>    issues. This causes ITL flows that go through the faulty switch/port to
>>    intermittently experience frame drops.  
>> 2) There exists SAN topologies where there are switch ports in the fabric
>>    that becomes the only  conduit for many different ITL(host--target--LUN)
>>    flows across multiple hosts. These single network paths are essentially
>>    shared across multiple ITL flows. Under these conditions if the port link
>>    bandwidth is not able to handle the net sum of the shared ITL flows bandwidth
>>    going through the single path  then we could see intermittent network
>>    congestion problems. This condition is called network oversubscription.
>>    The intermittent congestions can delay SCSI exchange completion time
>>    (increase in I/O latency is observed).
>>
>> To overcome the above network issues and many more such target issues, 
>> there are frame level retries that are done in HBA device firmware and 
>> I/O retries in the SCSI layer. These retries might succeed because of two reasons:
>> 1) The intermittent switch/port issue is not observed
>> 2) The retry I/O is a new  SCSI exchange. This SCSI exchange can take an
>>    alternate SAN path for the ITL flow, if such an SAN path exists.
>> 3) Network congestion disappears momentarily because the net I/O bandwidth
>>    coming from multiple ITL flows on the single shared network path is
>>    something the path can handle
>>
>> However in some cases we have seen I/O retries don't succeed because 
>> the retry I/Os hits a SAN network path that has intermittent 
>> switch/port issue and/or network congestion.
>>
>> On the host thus we see configurations two or more ITL path sharing 
>> the same target/LUN going through two or more HBA ports. These HBA 
>> ports are connected to two or more SAN to the same target/LUN.
>> If the I/O fails at the multipath layer then, the ITL path is turned 
>> into Failed state. Because of the marginal nature of the network, the 
>> next Health Check command sent from multipath layer might succeed, 
>> which results in making the ITL path into Active state. You end up 
>> seeing the DM path state going into Active, Failed, Active 
>> transitions. This results in overall reduction in application I/O 
>> throughput and sometime application I/O failures (because of timing 
>> constraints). All this can happen because of I/O retries and I/O 
>> request moving across multiple paths of the DM device. In the host it 
>> is to be noted all I/O retries on a single path and I/O movement 
>> across multiple paths results in slowing down the forward progress of 
>> new application I/O. Reason behind, the above I/O re-queue actions are given higher priority than the newer I/O requests coming from the application.
>>
>> The above condition of the  ITL path is hence called "marginal".
>>
>> What we desire is for the DM to deterministically  categorize a ITL 
>> Path as “marginal” and move all the pending I/Os from the marginal 
>> Path to an Active Path. This will help in meeting application I/O 
>> timing constraints. Also a capability to automatically re-instantiate 
>> the marginal path into Active once the marginal condition in the network is fixed.
>>
>>
>> Here is the description of implementation:
>> 1) PATCH 1/2 implements the algorithm that sends a couple of 
>> continuous IOs to a path which suffers two failed events in less than 
>> a given time. Those IOs are sent at a fix rate of 10 Hz.
>> 2) PATCH 2/2 discard the original algorithm because of this:
>> the detect sample interval of that path checkers is so big/coarse that 
>> it doesn't see what happens in the middle of the sample interval. We 
>> have the PATCH 1/2 as a better method.
>>
>>
>> Changes from V6:
>> * fix the warning of unwrapped commit description in patch 1/2
>> * add Reviewed-by tag of Muneendra
>> * add detailed scenario discription in the cover letter
>>
>> Changes from V5:
>> * rebase on the latest release 0.7.3
>>
>>
>> Changes from V4:
>> * path_io_err_XXX -> marginal_path_err_XXX. (Mumeendra)
>> * add one more parameters named marginal_path_double_failed_time instead
>>   of the fixed 60 seconds for the pre-checking of a shaky path. 
>> (Martin)
>> * fix for "reschedule checking after %d seconds" log
>> * path_io_err_recovery_time -> marginal_path_err_recheck_gap_time.
>> * put the marginal path into PATH_SHAKY instead of PATH_DELAYED
>> * Modify the commit comments to sync with the changes above.
>>
>>
>> Changes from V3:
>> * add a patch for discard the san_path_XXX_feature
>> * fail the path in the kernel before enqueueing the path for checking
>>   rather than after knowing the checking result to make it more
>>   reliable. (Martin)
>> * use posix_memalign instead of manual alignment for direct IO buffer. 
>> (Martin)
>> * use PATH_MAX to avoid certain compiler warning when opening file
>>   rather than FILE_NAME_SIZE. (Martin)
>> * discard unnecessary sanity check when getting block size (Martin)
>> * do not return 0 in send_each_aync_io if io_starttime of a path is
>>   not set(Martin)
>> * Wait 10ms instead of 60 second if every path is down. (Martin)
>> * rename handle_async_io_timeout to poll_async_io_timeout and use polling
>>   method because io_getevents does not return 0 if there are timeout IO
>>   and normal IO.
>> * rename hit_io_err_recover_time ro hit_io_err_recheck_time
>> * modify the multipath.conf.5 and commit comments to keep sync with the
>>   above changes
>>
>>
>> Changes from V2:
>> * fix uncondistional rescedule forverver
>> * use script/checkpatch.pl in Linux to cleanup informal coding style
>> * fix "continous" and "internel" typos
>>
>>
>> Changes from V1:
>> * send continous IO instead of a single IO in a sample interval 
>> (Martin)
>> * when recover time expires, we reschedule the checking process 
>> (Hannes)
>> * Use the error rate threshold as a permillage instead of IO 
>> number(Martin)
>> * Use a common io_context for libaio for all paths (Martin)
>> * Other small fixes (Martin)
>>
>>
>> Junxiong Guan (2):
>>   multipath-tools: intermittent IO error accounting to improve
>>     reliability
>>   multipath-tools: discard san_path_err_XXX feature
>>
>>  libmultipath/Makefile      |   5 +-
>>  libmultipath/config.c      |   3 -
>>  libmultipath/config.h      |  21 +-
>>  libmultipath/configure.c   |   7 +-
>>  libmultipath/dict.c        |  88 +++---
>>  libmultipath/io_err_stat.c | 744 
>> +++++++++++++++++++++++++++++++++++++++++++++
>>  libmultipath/io_err_stat.h |  15 +
>>  libmultipath/propsel.c     |  70 +++--
>>  libmultipath/propsel.h     |   7 +-
>>  libmultipath/structs.h     |  15 +-
>>  libmultipath/uevent.c      |  32 ++
>>  libmultipath/uevent.h      |   2 +
>>  multipath/multipath.conf.5 |  89 ++++--
>>  multipathd/main.c          | 140 ++++-----
>>  14 files changed, 1043 insertions(+), 195 deletions(-)  create mode 
>> 100644 libmultipath/io_err_stat.c  create mode 100644 
>> libmultipath/io_err_stat.h
>>
> 




More information about the dm-devel mailing list