[dm-devel] Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock

Nikos Tsironis ntsironis at arrikto.com
Thu Oct 10 11:58:35 UTC 2019


On 10/9/19 7:04 PM, Mike Snitzer wrote:
> On Wed, Oct 09 2019 at 11:44am -0400,
> Nikos Tsironis <ntsironis at arrikto.com> wrote:
> 
>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
>>> Nikos Tsironis <ntsironis at arrikto.com> wrote:
>>>
>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
>>>>> Hello Nikos,
>>>>>  Yes, issue is consistently reproducible with us, in a particular
>>>>> set-up and test case.
>>>>>  I will get the access to set-up next week, will try to test and let
>>>>> you know the results before end of next week.
>>>>>
>>>>
>>>> That sounds great!
>>>>
>>>> Thanks a lot,
>>>> Nikos
>>>
>>> Hi Guru,
>>>
>>> Any chance you could try this fix that I've staged to send to Linus?
>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
>>>
>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
>>> out this deadlock?
>>>
>>
>> Hi Mike,
>>
>> Yes,
>>
>> I created a 50G LV and took a snapshot of the same size:
>>
>>   lvcreate -n data-lv -L50G testvg
>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
>>
>> Then I ran the following fio job:
>>
>> [global]
>> randrepeat=1
>> ioengine=libaio
>> bs=1M
>> size=6G
>> offset_increment=6G
>> numjobs=8
>> direct=1
>> iodepth=32
>> group_reporting
>> filename=/dev/testvg/data-lv
>>
>> [test]
>> rw=write
>> timeout=180
>>
>> , concurrently with the following script:
>>
>> lvcreate -n dummy-lv -L1G testvg
>>
>> while true
>> do
>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
>>  lvremove -f testvg/dummy-snap
>> done
>>
>> This reproduced the deadlock for me. I also ran 'echo 30 >
>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
>> timeout.
>>
>> Nikos.
> 
> Very nice, well done.  Curious if you've tested with the fix I've staged
> (see above)?  If so, does it resolve the deadlock?  If you've had
> success I'd be happy to update the tags in the commit header to include
> your Tested-by before sending it to Linus.  Also, any review of the
> patch that you can do would be appreciated and with your formal
> Reviewed-by reply would be welcomed and folded in too.
> 

Yes, I have tested the staged fix. I forgot to mention it in my previous
mail.

I ran the test for the default 'snapshot_cow_threshold' value of 2048
and I also ran it for a value of 1, to stress it a little more.

In both cases everything went fine, the deadlock was gone.

Nikos

> Mike
> 




More information about the dm-devel mailing list