[dm-devel] dm-snap deadlock in pending_complete()

NeilBrown neilb at suse.com
Wed Aug 12 01:46:31 UTC 2015


On Tue, 11 Aug 2015 05:14:33 -0400 (EDT) Mikulas Patocka
<mpatocka at redhat.com> wrote:

> Hi
> 
> On Mon, 10 Aug 2015, NeilBrown wrote:
> 
> > 
> > Hi Mikulas,
> >  I have a customer hitting the deadlock you described over a year ago
> >  in:
> > 
> > Subject: [dm-devel] [PATCH] block: flush queued bios when the process
> >          blocks
> 
> Ask block layer maintainers to accept that patch.

Unfortunately I don't really like the patch ... or the bioset rescue
workqueues that it is based on.   Sorry.

So I might keep looking for a more localised approach....

> 
> > I notice that patch never went upstream.
> > 
> > Has anything else been done to fix this deadlock?
> > 
> > My thought was that something like the below would be sufficient.  Do
> > you see any problem with that?  It avoids the deadlock by dropping the
> > lock while sleeping.
> 
> The patch below introduces a bug - if the user is continuously reading the 
> same chunk (for example, by issuing multiple direct i/o requests to the 
> same chunk), the function pending_complete never finishes.
> 
> We need to hold the lock while waiting to make sure that no new read 
> requests are submitted.

Ahh - thanks for the explanation.  That makes sense.
Presumably you need to wait for old reads that you don't have a read
returning old data after a write has confirmed the new data is written?
Is there some other reason?

As soon as the new "proper" exception is installed, no new reads will
use the old address, so it should be safe to wait without holding the
lock.

So what about this?

Thanks for your time,
NeilBrown



diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 7c82d3ccce87..c7a7ed2cfd6d 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1461,14 +1461,22 @@ static void pending_complete(struct dm_snap_pending_exception *pe, int success)
 		goto out;
 	}
 
-	/* Check for conflicting reads */
-	__check_for_conflicting_io(s, pe->e.old_chunk);
+	/* Add proper exception.  Now all reads will be
+	 * redirected, so no new reads can be started on
+	 * 'old_chunk'.
+	 */
+	dm_insert_exception(&s->complete, e);
+
+	/* Drain conflicting reads */
+	if (__chunk_is_tracked(s, pe->e.old_chunk)) {
+		up_write(&s->lock);
+		__check_for_conflicting_io(s, pe->e.old_chunk);
+		down_write(&s->lock);
+	}
 
 	/*
-	 * Add a proper exception, and remove the
-	 * in-flight exception from the list.
+	 * And now we can remove the temporary exception.
 	 */
-	dm_insert_exception(&s->complete, e);
 
 out:
 	dm_remove_exception(&pe->e);




More information about the dm-devel mailing list