[dm-devel] block_abort_queue (blk_abort_request) racing with scsi_request_fn
Mike Anderson
andmike at linux.vnet.ibm.com
Wed May 12 05:23:37 UTC 2010
I was looking at a dump from a weekend run and I believe I am seeing a
case where blk_abort_request through blk_abort_queue picked up a request
for timeout that scsi_request_fn decided not to start. This test was under
error injection.
I assume the case in scsi_request_fn this is hitting is that a request has
been put on the timeout_list with blk_start_request and then one of the
not_ready checks is hit and the request is decided not to be started. I
believe the drop
It appears that my usage of walking the timeout_list in block_abort_queue
and using blk_mark_rq_complete in block_abort_request will not work in
this case.
While it would be good to have way to ensure a command is started, it is
unclear if even at a low timeout of 1 second that a user other than
blk_abort_queue would hit this race.
The dropping / acquiring of host_lock and queue_lock in scsi_request_fn
and scsi_dispatch_cmd make it unclear to me if usage of
blk_mark_rq_complete will cover all cases.
I looked at checking serial_number in scsi_times_out along with a couple
blk_mark_rq_complete additions, but unclear if this would be good and / or
work in all cases.
I looked at just accelerating deadline by some default value but unclear
if that would be acceptable.
I also looked at just using just the mark interface I previously posted
and not calling blk_abort_request at all, but that would change current
behavior that has been in use for a while.
Looking for suggestions.
Thanks,
-andmike
--
Michael Anderson
andmike at linux.vnet.ibm.com
More information about the dm-devel
mailing list