[dm-devel] Revert "dm bufio: fix deadlock with loop device"

Thu Aug 8 16:28:09 UTC 2019

On 8/8/19 8:01 AM, Mikulas Patocka wrote:

>>> Note that the patch bd293d071ffe doesn't really prevent the deadlock from
>>> occuring - if we look at the stacktrace reported by Junxiao Bi, we see
>>> that it hangs in bit_wait_io and not on the mutex - i.e. it has already
>>> successfully taken the mutex. Changing the mutex from mutex_lock to
>>> mutex_trylock won't help with deadlocks that happen afterwards.
>>>
>>> PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
>>>     #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
>>>     #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
>>>     #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
>>>     #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
>>>     #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
>>>     #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
>>>     #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
>>>     #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
>>>     #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
>>>     #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
>>>    #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
>>>    #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
>>>    #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
>>>    #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
>>>    #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
>> The above stack trace doesn't tell the entire story though.  Yes, one
>> process will have already gotten the the lock and is left waiting on
>> IO.  But that IO isn't able to complete because it is blocked on mm's
>> reclaim also trying to evict via same shrinker... so another thread will
>> be blocked waiting on the mutex (e.g. PID 14127 in your recent patch
>> header).
>>
>> Please re-read the header for commit bd293d071ffe -- I think that fix is
>> good.  But, I could still be wrong...;)
> The problem with the "dm_bufio_trylock" patch is - suppose that we are
> called with GFP_KERNEL context - we succeed with dm_bufio_trylock and then
> go to __make_buffer_clean->out_of_line_wait_on_bit->__wait_on_bit - if
> this wait depends no some I/O completion on the loop device, we still get
> a deadlock.

No, this is not right, see the source code in __try_evict_buffer(). It 
will never wait io in GFP_KERENL case.

1546     if (!(gfp & __GFP_FS)) {
1547         if (test_bit(B_READING, &b->state) ||
1548             test_bit(B_WRITING, &b->state) ||
1549             test_bit(B_DIRTY, &b->state))
1550             return false;
1551     }

Thanks,

Junxiao.

>
> The patch fixes some case of the deadlock, but it doesn't fix it entirely.
>
> Mikulas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20190808/8f446c1d/attachment.htm>