[dm-devel] Re: mpath: don't fail paths on first error

Hannes Reinecke hare at suse.de
Fri Jun 6 14:18:05 UTC 2008


Hi Mike,

Mike Christie wrote:
> The problem we see a lot at Red Hat is that if drivers fail a command 
> with DID_BUS_BUSY or DID_ERROR for something like underrun or even for 
> transient path problems, we can normally recover from this pretty 
> quickly and we do not need to switch path groups.
> 
Yeah, I thought about this, too.
> queue_if_no_path/no_path_retry will prevent IO from being fail upwards, 
> but just switching paths can cause a lot of strain on the target, so we 
> might want to prevent path switching when we do not need to. If we are 
> using a box that requires manual failover or a box that does not use 
> manual failover but still has to shift resources between storage 
> controllers when switching paths, we most likely do not want to mark 
> paths failed for these transient errors.
> 
Well, the original design idea was that it always will be quicker or
less error-prone to just move the I/O to the next path.
Seeing that this is not always the case this approach is probably
better.

> The attached patch allows us to wait X seconds before marking a path as 
> failed. If within X seconds from seeing the first IO error, we do not 
> see a IO complete successfully then we mark a path as failed. This patch 
> work best with the fail fast enhancements ones where for a lot of path 
> problems the fast io fail / recovery timeout will fail io quickly to us 
> and the test IOs do not get stuck, and where some errors like DID_ERROR 
> are not even failed fast.
> 
> The patch should apply over linus's tree or scsi-misc.
> 
Thanks for this, Mike.

Signed-off-by: Hannes Reinecke <hare at suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare at suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)




More information about the dm-devel mailing list