[dm-devel] [PATCH v7 0/2] multipath-tools: intermittent IO error accounting to improve reliability

Wed Nov 15 09:00:26 UTC 2017

Dear Christophe,

Any advance about this patch?  We are looking forward to your reply.
This patch set is of significance as described in the following previous
cover letter. Please consider this patch.

We are looking forward to your reply.

Thanks very much.

Guan Junxiong ( Huawei)
Muneendra (Brocade)

On 2017/10/24 9:57, Guan Junxiong wrote:
> Hi Christophe and All,
> 
> This patch set adds a new method of path state checking based on accounting
> IO error. This is useful in many scenarios such as intermittent IO error
> on a path due to intermittent frame drops, intermittent corruptions, network
> congestion or a shaky link.
> 
> This patch set is of significance because of this (quoted from the discussion
> with Muneendra, Brocade):
> 
> There are typically two type of SAN network problems that are categorized as
> marginal issues. These issues by nature are not permanent in time and do come
> and go away over time.
> 1) Switches in the SAN can have intermittent frame drops or intermittent
>    frame corruptions due to bad optics cable (SFP) or any such wear/tear port
>    issues. This causes ITL flows that go through the faulty switch/port to
>    intermittently experience frame drops.  
> 2) There exists SAN topologies where there are switch ports in the fabric
>    that becomes the only  conduit for many different ITL(host--target--LUN)
>    flows across multiple hosts. These single network paths are essentially
>    shared across multiple ITL flows. Under these conditions if the port link
>    bandwidth is not able to handle the net sum of the shared ITL flows bandwidth
>    going through the single path  then we could see intermittent network
>    congestion problems. This condition is called network oversubscription.
>    The intermittent congestions can delay SCSI exchange completion time
>    (increase in I/O latency is observed).
> 
> To overcome the above network issues and many more such target issues, there
> are frame level retries that are done in HBA device firmware and I/O retries
> in the SCSI layer. These retries might succeed because of two reasons:
> 1) The intermittent switch/port issue is not observed
> 2) The retry I/O is a new  SCSI exchange. This SCSI exchange can take an
>    alternate SAN path for the ITL flow, if such an SAN path exists.
> 3) Network congestion disappears momentarily because the net I/O bandwidth
>    coming from multiple ITL flows on the single shared network path is
>    something the path can handle
> 
> However in some cases we have seen I/O retries don't succeed because the retry
> I/Os hits a SAN network path that has intermittent switch/port issue and/or
> network congestion. 
> 
> On the host thus we see configurations two or more ITL path sharing the same
> target/LUN going through two or more HBA ports. These HBA ports are connected
> to two or more SAN to the same target/LUN.
> If the I/O fails at the multipath layer then, the ITL path is turned into
> Failed state. Because of the marginal nature of the network, the next Health
> Check command sent from multipath layer might succeed, which results in making
> the ITL path into Active state. You end up seeing the DM path state going into
> Active, Failed, Active transitions. This results in overall reduction in
> application I/O throughput and sometime application I/O failures (because of
> timing constraints). All this can happen because of I/O retries and I/O request
> moving across multiple paths of the DM device. In the host it is to be noted
> all I/O retries on a single path and I/O movement across multiple paths results
> in slowing down the forward progress of new application I/O. Reason behind,
> the above I/O re-queue actions are given higher priority than the newer I/O
> requests coming from the application. 
> 
> The above condition of the  ITL path is hence called "marginal".
> 
> What we desire is for the DM to deterministically  categorize a ITL Path as
> “marginal” and move all the pending I/Os from the marginal Path to an Active
> Path. This will help in meeting application I/O timing constraints. Also a
> capability to automatically re-instantiate the marginal path into Active once
> the marginal condition in the network is fixed.
> 
> 
> Here is the description of implementation:
> 1) PATCH 1/2 implements the algorithm that sends a couple of continuous IOs
> to a path which suffers two failed events in less than a given time. Those
> IOs are sent at a fix rate of 10 Hz.
> 2) PATCH 2/2 discard the original algorithm because of this:
> the detect sample interval of that path checkers is so big/coarse that
> it doesn't see what happens in the middle of the sample interval. We have
> the PATCH 1/2 as a better method.
> 
> 
> Changes from V6:
> * fix the warning of unwrapped commit description in patch 1/2 
> * add Reviewed-by tag of Muneendra
> * add detailed scenario discription in the cover letter