[dm-devel] Re: mpath: don't fail paths on first error
Hannes Reinecke
hare at suse.de
Fri Jun 6 14:18:05 UTC 2008
Hi Mike,
Mike Christie wrote:
> The problem we see a lot at Red Hat is that if drivers fail a command
> with DID_BUS_BUSY or DID_ERROR for something like underrun or even for
> transient path problems, we can normally recover from this pretty
> quickly and we do not need to switch path groups.
>
Yeah, I thought about this, too.
> queue_if_no_path/no_path_retry will prevent IO from being fail upwards,
> but just switching paths can cause a lot of strain on the target, so we
> might want to prevent path switching when we do not need to. If we are
> using a box that requires manual failover or a box that does not use
> manual failover but still has to shift resources between storage
> controllers when switching paths, we most likely do not want to mark
> paths failed for these transient errors.
>
Well, the original design idea was that it always will be quicker or
less error-prone to just move the I/O to the next path.
Seeing that this is not always the case this approach is probably
better.
> The attached patch allows us to wait X seconds before marking a path as
> failed. If within X seconds from seeing the first IO error, we do not
> see a IO complete successfully then we mark a path as failed. This patch
> work best with the fail fast enhancements ones where for a lot of path
> problems the fast io fail / recovery timeout will fail io quickly to us
> and the test IOs do not get stuck, and where some errors like DID_ERROR
> are not even failed fast.
>
> The patch should apply over linus's tree or scsi-misc.
>
Thanks for this, Mike.
Signed-off-by: Hannes Reinecke <hare at suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare at suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
More information about the dm-devel
mailing list