[dm-devel] blk_abort_queue on failed paths?

Mike Christie michaelc at cs.wisc.edu
Wed Jun 3 21:39:09 UTC 2009


adding linux-scsi and Mike Anderson

David Strand wrote:
> After updating to kernel 2.6.28 I found that when I performed some
> cable break testing during device i/o, I would get unwanted device or
> host resets. Ultimately I traced it back to this patch:
> 
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.29.y.git;a=commit;h=224cb3e981f1b2f9f93dbd49eaef505d17d894c2
> 
> The call to blk_abort_queue causes the block layer to call
> scsi_times_out for pending i/o, which can (or will) ultimately lead to
> device, and/or bus and/or host resets, which of course cause all the
> other devices significant disruption.
> 

What driver were you using? I just did a work around for qla4xxx for 
this (have not posted it yet). I added a scsi_times_out handler to the 
driver so that if the IO was failed to a transport problem then the eh 
does not run.

FC drivers already use fc_timed_out, but I think that will not work. The 
FC driver could fail the IO then call fc_remote_port_delete. So the 
failed IO could hit dm-mpath.c and that could call into the 
scsi_times_out (which for fc drivers call into fc_timed_out) but the 
fc_remote_port_delete has not been done yet, so the port_state is still 
online so that kicks off the scsi eh.

For transport errors I do not think blk_abort_queue is needed anymore - 
at least for scsi drivers. For FC almost every driver supports the 
terminate_rport_io call back (just mptfc does not), so you can set the 
fast io fail tmo to make sure all IO is failed quickly. For iscsi, we 
have the replacement/recovery_timeout. And for SAS, I think there is a 
timeout or the device/target/port is deleted, right?


> What was the reason for this change? I searched through my email from
> this mailing list and could not find a discussion about it.


It seems like it would only make sense to call blk_abort_queue for maybe 
some block drivers (does cciss or dasd need it) or maybe for device 
errors. But it seems to be broken for the common multipath use cases.




More information about the dm-devel mailing list