[dm-devel] [PATCH] Fix deadlock with request based dm and some drivers
Jun'ichi Nomura
j-nomura at ce.jp.nec.com
Tue Nov 6 12:14:30 UTC 2012
(Cc to dm-devel)
On 11/06/12 20:29, Jens Axboe wrote:
> I recently fixed a deadlock for a customer we have with
> the below patch. I have queued it up in my tree as not
> to lose it. Can I have an ack from you, or do you want
> to submit it yourself? I've marked it stable as well.
Could you tell details about how the deadlock happened?
dm's end_io always throws the completion handling to softirq:
end_clone_request()
dm_complete_request()
blk_complete_request()
then it is processed in softirq context:
dm_softirq_done()
dm_done()
dm_end_request()
rq_completed(run_queue=true)
blk_run_queue()
Since queue_lock is always held with interrupt disabled,
I couldn't see why it could deadlock.
Request-based dm followed the completion model of scsi
mid layer. So similar code path exists in scsi, too.
For example:
scsi_request_fn()
scsi_kill_request()
blk_complete_request()
then:
scsi_softirq_done()
scsi_io_completion()
scsi_next_command()
scsi_run_queue()
spin_lock queue_lock
__blk_run_queue()
If calling run-queue from softirq_done_fn() can cause deadlock,
I'm afraid the problem is not limited to dm.
> From 8ca211056519ac06bc96fb134dca1f8eb2141407 Mon Sep 17 00:00:00 2001
> From: Jens Axboe <axboe at kernel.dk>
> Date: Tue, 6 Nov 2012 12:24:26 +0100
> Subject: [PATCH] dm: fix deadlock with request based dm and queue request_fn
> recursion
>
> Request based dm attempts to re-run the request queue off the
> request completion path. If used with a driver that potentially does
> end_io from its request_fn, we could deadlock trying to recurse
> back into request dispatch. Fix this by punting the request queue
> run to kblockd.
>
> Tested to fix a quickly reproducible deadlock in such a scenario.
>
> Cc: stable at kernel.org
> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> ---
> drivers/md/dm.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 02db918..77e6eff 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -740,8 +740,14 @@ static void rq_completed(struct mapped_device *md, int rw, int run_queue)
> if (!md_in_flight(md))
> wake_up(&md->wait);
>
> + /*
> + * Run this off this callpath, as drivers could invoke end_io while
> + * inside their request_fn (and holding the queue lock). Calling
> + * back into ->request_fn() could deadlock attempting to grab the
> + * queue lock again.
> + */
> if (run_queue)
> - blk_run_queue(md->queue);
> + blk_run_queue_async(md->queue);
>
> /*
> * dm_put() must be at the end of this function. See the comment above
--
Jun'ichi Nomura, NEC Corporation
More information about the dm-devel
mailing list