[dm-devel] Revert "dm bufio: fix deadlock with loop device"

Mikulas Patocka mpatocka at redhat.com
Thu Aug 8 15:01:58 UTC 2019



On Thu, 8 Aug 2019, Mike Snitzer wrote:

> On Thu, Aug 08 2019 at  5:40am -0400,
> Mikulas Patocka <mpatocka at redhat.com> wrote:
> 
> > Revert the patch bd293d071ffe65e645b4d8104f9d8fe15ea13862. A proper fix
> > should be not to use GFP_KERNEL in the function do_blockdev_direct_IO.
> 
> Matthew Wilcox pointed out that the "proper fix" is loop should be using
> memalloc_noio_{save,restore}

He's right, I will post another patch that sets memalloc_noio_save in 
loop.

> > Note that the patch bd293d071ffe doesn't really prevent the deadlock from
> > occuring - if we look at the stacktrace reported by Junxiao Bi, we see
> > that it hangs in bit_wait_io and not on the mutex - i.e. it has already
> > successfully taken the mutex. Changing the mutex from mutex_lock to
> > mutex_trylock won't help with deadlocks that happen afterwards.
> > 
> > PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
> >    #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
> >    #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
> >    #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
> >    #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
> >    #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
> >    #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
> >    #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
> >    #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
> >    #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
> >    #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
> >   #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
> >   #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
> >   #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
> >   #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
> >   #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242
> 
> The above stack trace doesn't tell the entire story though.  Yes, one
> process will have already gotten the the lock and is left waiting on
> IO.  But that IO isn't able to complete because it is blocked on mm's
> reclaim also trying to evict via same shrinker... so another thread will
> be blocked waiting on the mutex (e.g. PID 14127 in your recent patch
> header).
> 
> Please re-read the header for commit bd293d071ffe -- I think that fix is
> good.  But, I could still be wrong... ;)

The problem with the "dm_bufio_trylock" patch is - suppose that we are 
called with GFP_KERNEL context - we succeed with dm_bufio_trylock and then 
go to __make_buffer_clean->out_of_line_wait_on_bit->__wait_on_bit - if 
this wait depends no some I/O completion on the loop device, we still get 
a deadlock.

The patch fixes some case of the deadlock, but it doesn't fix it entirely.

Mikulas

> > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> > Cc: stable at vger.kernel.org
> > Fixes: bd293d071ffe ("dm bufio: fix deadlock with loop device")
> > 
> > ---
> >  drivers/md/dm-bufio.c |    4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6/drivers/md/dm-bufio.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-bufio.c	2019-08-02 08:52:35.000000000 +0200
> > +++ linux-2.6/drivers/md/dm-bufio.c	2019-08-08 11:19:13.000000000 +0200
> > @@ -1604,7 +1604,9 @@ dm_bufio_shrink_scan(struct shrinker *sh
> >  	unsigned long freed;
> >  
> >  	c = container_of(shrink, struct dm_bufio_client, shrinker);
> > -	if (!dm_bufio_trylock(c))
> > +	if (sc->gfp_mask & __GFP_FS)
> > +		dm_bufio_lock(c);
> > +	else if (!dm_bufio_trylock(c))
> >  		return SHRINK_STOP;
> >  
> >  	freed  = __scan(c, sc->nr_to_scan, sc->gfp_mask);
> 
> I don't see the performance win of micro-optimizing to not use
> dm_bufio_trylock() worth it.  BUT on the flip side, dm_bufio_lock()
> _was_ the canary in the coal mine here: meaning it caught that loop
> isn't properly using memalloc_noio_{save,restore}...
> 
> I'm just not liking the prospect of bufio being the smoking gun for
> future deadlocks.  But I could go either way with this...
> 
> But if we do revert, this patch header needs to:
> 
> Depends-on: XXXXXXXXX ("loop: use memalloc_noio_{save,restore}")
> 
> Mike
> 




More information about the dm-devel mailing list