[dm-devel] Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock

Guruswamy Basavaiah guru2018 at gmail.com
Fri Oct 11 10:17:05 UTC 2019


Hello Nikos,
 Applied these patches and tested.
 We still see hung_task_timeout back traces and the drbd Resync is blocked.
 Attached the back trace, please let me know if you need any other information.

 In patch "0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch"
I change "struct wait_queue_head" to "wait_queue_head_t" as i was
getting compilation error with former one.

On Thu, 10 Oct 2019 at 17:33, Nikos Tsironis <ntsironis at arrikto.com> wrote:
>
> On 10/10/19 9:34 AM, Guruswamy Basavaiah wrote:
> > Hello,
> > We use 4.4.184 in our builds and the patch fails to apply.
> > Is it possible to give a patch for 4.4.x branch ?
> Hi Guru,
>
> I attach the two patches fixing the deadlock rebased on the 4.4.x branch.
>
> Nikos
>
> >
> > patching Logs.
> > patching file drivers/md/dm-snap.c
> > Hunk #1 succeeded at 19 (offset 1 line).
> > Hunk #2 succeeded at 105 (offset -1 lines).
> > Hunk #3 succeeded at 157 (offset -4 lines).
> > Hunk #4 succeeded at 1206 (offset -120 lines).
> > Hunk #5 FAILED at 1508.
> > Hunk #6 succeeded at 1412 (offset -124 lines).
> > Hunk #7 succeeded at 1425 (offset -124 lines).
> > Hunk #8 FAILED at 1925.
> > Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
> > Hunk #10 succeeded at 2202 (offset -294 lines).
> > Hunk #11 succeeded at 2332 (offset -294 lines).
> > 2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej
> >
> > Guru
> >
> > On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018 at gmail.com> wrote:
> >>
> >> Hello Mike,
> >>  I will get the testing result before end of Thursday.
> >> Guru
> >>
> >> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer at redhat.com> wrote:
> >>>
> >>> On Wed, Oct 09 2019 at 11:44am -0400,
> >>> Nikos Tsironis <ntsironis at arrikto.com> wrote:
> >>>
> >>>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> >>>>> Nikos Tsironis <ntsironis at arrikto.com> wrote:
> >>>>>
> >>>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> >>>>>>> Hello Nikos,
> >>>>>>>  Yes, issue is consistently reproducible with us, in a particular
> >>>>>>> set-up and test case.
> >>>>>>>  I will get the access to set-up next week, will try to test and let
> >>>>>>> you know the results before end of next week.
> >>>>>>>
> >>>>>>
> >>>>>> That sounds great!
> >>>>>>
> >>>>>> Thanks a lot,
> >>>>>> Nikos
> >>>>>
> >>>>> Hi Guru,
> >>>>>
> >>>>> Any chance you could try this fix that I've staged to send to Linus?
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> >>>>>
> >>>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
> >>>>> out this deadlock?
> >>>>>
> >>>>
> >>>> Hi Mike,
> >>>>
> >>>> Yes,
> >>>>
> >>>> I created a 50G LV and took a snapshot of the same size:
> >>>>
> >>>>   lvcreate -n data-lv -L50G testvg
> >>>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
> >>>>
> >>>> Then I ran the following fio job:
> >>>>
> >>>> [global]
> >>>> randrepeat=1
> >>>> ioengine=libaio
> >>>> bs=1M
> >>>> size=6G
> >>>> offset_increment=6G
> >>>> numjobs=8
> >>>> direct=1
> >>>> iodepth=32
> >>>> group_reporting
> >>>> filename=/dev/testvg/data-lv
> >>>>
> >>>> [test]
> >>>> rw=write
> >>>> timeout=180
> >>>>
> >>>> , concurrently with the following script:
> >>>>
> >>>> lvcreate -n dummy-lv -L1G testvg
> >>>>
> >>>> while true
> >>>> do
> >>>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
> >>>>  lvremove -f testvg/dummy-snap
> >>>> done
> >>>>
> >>>> This reproduced the deadlock for me. I also ran 'echo 30 >
> >>>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
> >>>> timeout.
> >>>>
> >>>> Nikos.
> >>>
> >>> Very nice, well done.  Curious if you've tested with the fix I've staged
> >>> (see above)?  If so, does it resolve the deadlock?  If you've had
> >>> success I'd be happy to update the tags in the commit header to include
> >>> your Tested-by before sending it to Linus.  Also, any review of the
> >>> patch that you can do would be appreciated and with your formal
> >>> Reviewed-by reply would be welcomed and folded in too.
> >>>
> >>> Mike
> >>
> >>
> >>
> >> --
> >> Guruswamy Basavaiah
> >
> >
> >



-- 
Guruswamy Basavaiah
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reboot.1.log
Type: text/x-log
Size: 26740 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20191011/c4afa069/attachment.bin>


More information about the dm-devel mailing list