[dm-devel] dm-snap deadlock in pending_complete()

Thu Aug 13 01:23:16 UTC 2015

On Wed, 12 Aug 2015 10:16:17 -0400 (EDT) Mikulas Patocka
<mpatocka at redhat.com> wrote:

> 
> 
> On Wed, 12 Aug 2015, NeilBrown wrote:
> 
> > Ahh - thanks for the explanation.  That makes sense.
> > Presumably you need to wait for old reads that you don't have a read
> > returning old data after a write has confirmed the new data is written?
> > Is there some other reason?
> > 
> > As soon as the new "proper" exception is installed, no new reads will
> > use the old address, so it should be safe to wait without holding the
> > lock.
> > 
> > So what about this?
> 
> This is wrong too - when you add the exception to the complete hash table 
> and drop the lock, writes to the chunk in the origin target are can modify 
> the origin volume. These writes race with pending reads to the snapshot 
> (__chunk_is_tracked tests for those reads). If the I/O scheduler schedules 
> the write to the origin before the read from the snapshot, the read from 
> the snapshot returns corrupted data.
> 
> There can be more problems with snapshot merging - if the chunk is on the 
> complete hash table, the merging code assumes that it can be safely 
> merged.
> 

Thanks again.

I came up with the following which should address the origin-write
issue, but I'm not so sure about snapshot merging, and it is becoming a
lot less simple that I had hoped.

I'll see if I can come up with code for a more general solution that is
still localised to dm - along the lines of my bio_split suggestion.

Thanks,
NeilBrown

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 7c82d3ccce87..7f9e1b0429c8 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1461,14 +1461,22 @@ static void pending_complete(struct dm_snap_pending_exception *pe, int success)
 		goto out;
 	}
 
-	/* Check for conflicting reads */
-	__check_for_conflicting_io(s, pe->e.old_chunk);
+	/* Add proper exception.  Now all reads will be
+	 * redirected, so no new reads can be started on
+	 * 'old_chunk'.
+	 */
+	dm_insert_exception(&s->complete, e);
+
+	/* Drain conflicting reads */
+	if (__chunk_is_tracked(s, pe->e.old_chunk)) {
+		up_write(&s->lock);
+		__check_for_conflicting_io(s, pe->e.old_chunk);
+		down_write(&s->lock);
+	}
 
 	/*
-	 * Add a proper exception, and remove the
-	 * in-flight exception from the list.
+	 * And now we can remove the temporary exception.
 	 */
-	dm_insert_exception(&s->complete, e);
 
 out:
 	dm_remove_exception(&pe->e);
@@ -2089,17 +2097,17 @@ static int __origin_write(struct list_head *snapshots, sector_t sector,
 		 */
 		chunk = sector_to_chunk(snap->store, sector);
 
-		/*
-		 * Check exception table to see if block
-		 * is already remapped in this snapshot
-		 * and trigger an exception if not.
-		 */
-		e = dm_lookup_exception(&snap->complete, chunk);
-		if (e)
-			goto next_snapshot;
-
 		pe = __lookup_pending_exception(snap, chunk);
 		if (!pe) {
+			/*
+			 * Check exception table to see if block
+			 * is already remapped in this snapshot
+			 * and trigger an exception if not.
+			 */
+			e = dm_lookup_exception(&snap->complete, chunk);
+			if (e)
+				goto next_snapshot;
+
 			up_write(&snap->lock);
 			pe = alloc_pending_exception(snap);
 			down_write(&snap->lock);