[dm-devel] Fix "dm kcopyd: Fix bug causing workqueue stalls" causes dead lock

Guruswamy Basavaiah guru2018 at gmail.com
Thu Oct 17 05:58:05 UTC 2019


Hello Nikos,
 Tested with your new patches. Issue is resolved. Thank you.
 In second patch "struct wait_queue_head" to "wait_queue_head_t" for
variable in_progress_wait, else compilation is failing with error
 "error: field 'in_progress_wait' has incomplete type
  struct wait_queue_head in_progress_wait;"
 Attached the changed patch.

Guru

On Sat, 12 Oct 2019 at 14:16, Guruswamy Basavaiah <guru2018 at gmail.com> wrote:
>
> Hello Nikos,
>  I am having some issues in our set-up, I will try to get the results ASAP.
> Guru
>
>
> On Fri, 11 Oct 2019 at 17:47, Nikos Tsironis <ntsironis at arrikto.com> wrote:
> >
> > On 10/11/19 2:39 PM, Nikos Tsironis wrote:
> > > On 10/11/19 1:17 PM, Guruswamy Basavaiah wrote:
> > >> Hello Nikos,
> > >>  Applied these patches and tested.
> > >>  We still see hung_task_timeout back traces and the drbd Resync is blocked.
> > >>  Attached the back trace, please let me know if you need any other information.
> > >>
> > >
> > > Hi Guru,
> > >
> > > Can you provide more information about your setup? The output of
> > > 'dmsetup table', 'dmsetup ls --tree' and the DRBD configuration would
> > > help to get a better picture of your I/O stack.
> > >
> > > Also, is it possible to describe the test case you are running and
> > > exactly what it does?
> > >
> > > Thanks,
> > > Nikos
> > >
> >
> > Hi Guru,
> >
> > I believe I found the mistake. The in_progress variable was never
> > initialized to zero.
> >
> > I attach a new version of the second patch correcting this.
> >
> > Can you please test again with this patch?
> >
> > Thanks,
> > Nikos
> >
> > >>  In patch "0002-dm-snapshot-rework-COW-throttling-to-fix-deadlock.patch"
> > >> I change "struct wait_queue_head" to "wait_queue_head_t" as i was
> > >> getting compilation error with former one.
> > >>
> > >> On Thu, 10 Oct 2019 at 17:33, Nikos Tsironis <ntsironis at arrikto.com> wrote:
> > >>>
> > >>> On 10/10/19 9:34 AM, Guruswamy Basavaiah wrote:
> > >>>> Hello,
> > >>>> We use 4.4.184 in our builds and the patch fails to apply.
> > >>>> Is it possible to give a patch for 4.4.x branch ?
> > >>> Hi Guru,
> > >>>
> > >>> I attach the two patches fixing the deadlock rebased on the 4.4.x branch.
> > >>>
> > >>> Nikos
> > >>>
> > >>>>
> > >>>> patching Logs.
> > >>>> patching file drivers/md/dm-snap.c
> > >>>> Hunk #1 succeeded at 19 (offset 1 line).
> > >>>> Hunk #2 succeeded at 105 (offset -1 lines).
> > >>>> Hunk #3 succeeded at 157 (offset -4 lines).
> > >>>> Hunk #4 succeeded at 1206 (offset -120 lines).
> > >>>> Hunk #5 FAILED at 1508.
> > >>>> Hunk #6 succeeded at 1412 (offset -124 lines).
> > >>>> Hunk #7 succeeded at 1425 (offset -124 lines).
> > >>>> Hunk #8 FAILED at 1925.
> > >>>> Hunk #9 succeeded at 1866 with fuzz 2 (offset -255 lines).
> > >>>> Hunk #10 succeeded at 2202 (offset -294 lines).
> > >>>> Hunk #11 succeeded at 2332 (offset -294 lines).
> > >>>> 2 out of 11 hunks FAILED -- saving rejects to file drivers/md/dm-snap.c.rej
> > >>>>
> > >>>> Guru
> > >>>>
> > >>>> On Thu, 10 Oct 2019 at 01:33, Guruswamy Basavaiah <guru2018 at gmail.com> wrote:
> > >>>>>
> > >>>>> Hello Mike,
> > >>>>>  I will get the testing result before end of Thursday.
> > >>>>> Guru
> > >>>>>
> > >>>>> On Wed, 9 Oct 2019 at 21:34, Mike Snitzer <snitzer at redhat.com> wrote:
> > >>>>>>
> > >>>>>> On Wed, Oct 09 2019 at 11:44am -0400,
> > >>>>>> Nikos Tsironis <ntsironis at arrikto.com> wrote:
> > >>>>>>
> > >>>>>>> On 10/9/19 5:13 PM, Mike Snitzer wrote:> On Tue, Oct 01 2019 at  8:43am -0400,
> > >>>>>>>> Nikos Tsironis <ntsironis at arrikto.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> On 10/1/19 3:27 PM, Guruswamy Basavaiah wrote:
> > >>>>>>>>>> Hello Nikos,
> > >>>>>>>>>>  Yes, issue is consistently reproducible with us, in a particular
> > >>>>>>>>>> set-up and test case.
> > >>>>>>>>>>  I will get the access to set-up next week, will try to test and let
> > >>>>>>>>>> you know the results before end of next week.
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> That sounds great!
> > >>>>>>>>>
> > >>>>>>>>> Thanks a lot,
> > >>>>>>>>> Nikos
> > >>>>>>>>
> > >>>>>>>> Hi Guru,
> > >>>>>>>>
> > >>>>>>>> Any chance you could try this fix that I've staged to send to Linus?
> > >>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.4&id=633b1613b2a49304743c18314bb6e6465c21fd8a
> > >>>>>>>>
> > >>>>>>>> Shiort of that, Nikos: do you happen to have a test scenario that teases
> > >>>>>>>> out this deadlock?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> Hi Mike,
> > >>>>>>>
> > >>>>>>> Yes,
> > >>>>>>>
> > >>>>>>> I created a 50G LV and took a snapshot of the same size:
> > >>>>>>>
> > >>>>>>>   lvcreate -n data-lv -L50G testvg
> > >>>>>>>   lvcreate -n snap-lv -L50G -s testvg/data-lv
> > >>>>>>>
> > >>>>>>> Then I ran the following fio job:
> > >>>>>>>
> > >>>>>>> [global]
> > >>>>>>> randrepeat=1
> > >>>>>>> ioengine=libaio
> > >>>>>>> bs=1M
> > >>>>>>> size=6G
> > >>>>>>> offset_increment=6G
> > >>>>>>> numjobs=8
> > >>>>>>> direct=1
> > >>>>>>> iodepth=32
> > >>>>>>> group_reporting
> > >>>>>>> filename=/dev/testvg/data-lv
> > >>>>>>>
> > >>>>>>> [test]
> > >>>>>>> rw=write
> > >>>>>>> timeout=180
> > >>>>>>>
> > >>>>>>> , concurrently with the following script:
> > >>>>>>>
> > >>>>>>> lvcreate -n dummy-lv -L1G testvg
> > >>>>>>>
> > >>>>>>> while true
> > >>>>>>> do
> > >>>>>>>  lvcreate -n dummy-snap -L1M -s testvg/dummy-lv
> > >>>>>>>  lvremove -f testvg/dummy-snap
> > >>>>>>> done
> > >>>>>>>
> > >>>>>>> This reproduced the deadlock for me. I also ran 'echo 30 >
> > >>>>>>> /proc/sys/kernel/hung_task_timeout_secs', to reduce the hung task
> > >>>>>>> timeout.
> > >>>>>>>
> > >>>>>>> Nikos.
> > >>>>>>
> > >>>>>> Very nice, well done.  Curious if you've tested with the fix I've staged
> > >>>>>> (see above)?  If so, does it resolve the deadlock?  If you've had
> > >>>>>> success I'd be happy to update the tags in the commit header to include
> > >>>>>> your Tested-by before sending it to Linus.  Also, any review of the
> > >>>>>> patch that you can do would be appreciated and with your formal
> > >>>>>> Reviewed-by reply would be welcomed and folded in too.
> > >>>>>>
> > >>>>>> Mike
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Guruswamy Basavaiah
> > >>>>
> > >>>>
> > >>>>
> > >>
> > >>
> > >>
>
>
>
> --
> Guruswamy Basavaiah



-- 
Guruswamy Basavaiah




More information about the dm-devel mailing list