[dm-devel] [PATCH] Fix deadlock with request based dm and some drivers
Jun'ichi Nomura
j-nomura at ce.jp.nec.com
Wed Nov 7 08:35:37 UTC 2012
On 11/07/12 17:00, Jens Axboe wrote:
> On 2012-11-07 01:15, Jun'ichi Nomura wrote:
>> On 11/06/12 21:35, Jens Axboe wrote:
>>> On 2012-11-06 13:14, Jun'ichi Nomura wrote:
>>>> (Cc to dm-devel)
>>>>
>>>> On 11/06/12 20:29, Jens Axboe wrote:
>>>>> I recently fixed a deadlock for a customer we have with
>>>>> the below patch. I have queued it up in my tree as not
>>>>> to lose it. Can I have an ack from you, or do you want
>>>>> to submit it yourself? I've marked it stable as well.
>>>>
>>>> Could you tell details about how the deadlock happened?
>>>>
>>>> dm's end_io always throws the completion handling to softirq:
>>>> end_clone_request()
>>>> dm_complete_request()
>>>> blk_complete_request()
>>>>
>>>> then it is processed in softirq context:
>>>> dm_softirq_done()
>>>> dm_done()
>>>> dm_end_request()
>>>> rq_completed(run_queue=true)
>>>> blk_run_queue()
>>>>
>>>> Since queue_lock is always held with interrupt disabled,
>>>> I couldn't see why it could deadlock.
>>>>
>>>> Request-based dm followed the completion model of scsi
>>>> mid layer. So similar code path exists in scsi, too.
>>>> For example:
>>>> scsi_request_fn()
>>>> scsi_kill_request()
>>>> blk_complete_request()
>>>> then:
>>>> scsi_softirq_done()
>>>> scsi_io_completion()
>>>> scsi_next_command()
>>>> scsi_run_queue()
>>>> spin_lock queue_lock
>>>> __blk_run_queue()
>>>>
>>>> If calling run-queue from softirq_done_fn() can cause deadlock,
>>>> I'm afraid the problem is not limited to dm.
>>>
>>> dm_softirq_done()
>>> rq_completed()
>>> blk_run_queue()
>> ^^ this is for dm's queue
>>> dm_request_fn()
>>> dm_dispatch_request()
>>> blk_insert_cloned_request()
>>> __elv_add_request()
>>> elv_insert()
>>> blk_run_queue()
>> ^^ this is for lower device's queue
>>> ...
>>>
>>> Basically you recurse back into the request handler, since it ends up
>>> running the queue.
>>
>> But the queues are different as commented inline above.
>> So it should be ok. (from deadlock point of view)
>
> It's still not OK, some drivers end up doing spin_unlock_irq() in their
> request_fn. Running unknown request_fn from ipi/irq is a bad idea, imho,
> it'll quickly cause problems.
Though I still don't think there is actual deadlock because
dm's queue lock is released before dm_dispatch_request(),
I see your point why it's potentially bad.
>>> But I see I've been too focused on the older release,
>>> since we don't actually do that anymore after the plugging rewrite. So
>>> it should actually be safe in current kernels and hence the patch only
>>> needed for the stable series where that is not the case.
>>
>> BTW, using blk_run_queue_async() in softirq_done_fn() might be good
>> from performance point of view? It may reduce latency of the softirq
>> and have an effect of batching request_fn calls.
>
> Yes, I argued the same for the original people who saw the problem. It
> is quite an extensive and expensive path to have off the IPI handler. So
> from that point of view, I'd recommend the patch as well.
Thank you for confirmation.
If it's good for performance, the patch is fine with me.
--
Jun'ichi Nomura, NEC Corporation
More information about the dm-devel
mailing list