[dm-devel] Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock

Nikos Tsironis ntsironis at arrikto.com
Wed Oct 9 15:44:49 UTC 2019


On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> Nikos Tsironis <ntsironis at arrikto.com> wrote:
> 
>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
>>> Hello Nikos,
>>>  Yes, issue is consistently reproducible with us, in a particular
>>> set-up and test case.
>>>  I will get the access to set-up next week, will try to test and let
>>> you know the results before end of next week.
>>>
>>
>> That sounds great!
>>
>> Thanks a lot,
>> Nikos
> 
> Hi Guru,
> 
> Any chance you could try this fix that I've staged to send to Linus?
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> 
> Shiort of that, Nikos: do you happen to have a test scenario that teases
> out this deadlock?
> 

Hi Mike,

Yes,

I created a 50G LV and took a snapshot of the same size:

  lvcreate -n data-lv -L50G testvg
  lvcreate -n snap-lv -L50G -s testvg/data-lv

Then I ran the following fio job:

[global]
randrepeat=1
ioengine=libaio
bs=1M
size=6G
offset_increment=6G
numjobs=8
direct=1
iodepth=32
group_reporting
filename=/dev/testvg/data-lv

[test]
rw=write
timeout=180

, concurrently with the following script:

lvcreate -n dummy-lv -L1G testvg

while true
do
 lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
 lvremove -f testvg/dummy-snap
done

This reproduced the deadlock for me. I also ran 'echo 30 >
/proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
timeout.

Nikos.

> Thanks,
> Mike
> 




More information about the dm-devel mailing list