[dm-devel] [Lsf] Notes from the four separate IO track sessions at LSF/MM

Bart Van Assche bart.vanassche at sandisk.com
Thu Apr 28 16:41:26 UTC 2016


On 04/28/2016 09:23 AM, Laurence Oberman wrote:
> We still suffer from periodic complaints in our large customer base
 > regarding the long recovery times for dm-multipath.
> Most of the time this is when we have something like a switch
 > back-plane issue or an issue where RSCN'S are blocked coming back up
 > the fabric. Corner cases still bite us often.
>
> Most of the complaints originate from customers for example seeing
 > Oracle cluster evictions where during the waiting on the mid-layer
 > all mpath I/O is blocked until recovery.
>
> We have to tune eh_deadline, eh_timeout and fast_io_fail_tmo but
 > even tuning those we have to wait on serial recovery even if we
 > set the timeouts low.
>
> Lately we have been living with
> eh_deadline=10
> eh_timeout=5
> fast_fail_io_tmo=10
> leaving default sd timeout at 30s
>
> So this continues to be an issue and I have specific examples using
 > the jammer I can provide showing the serial recovery times here.

Hello Laurence,

The long recovery times you refer to, is that for a scenario where all 
paths failed or for a scenario where some paths failed and other paths 
are still working? In the latter case, how long does it take before 
dm-multipath fails over to another path?

Thanks,

Bart.




More information about the dm-devel mailing list