[dm-devel] dm-snap deadlock in pending_complete()

Mikulas Patocka mpatocka at redhat.com
Wed Aug 12 14:16:17 UTC 2015



On Wed, 12 Aug 2015, NeilBrown wrote:

> Ahh - thanks for the explanation.  That makes sense.
> Presumably you need to wait for old reads that you don't have a read
> returning old data after a write has confirmed the new data is written?
> Is there some other reason?
> 
> As soon as the new "proper" exception is installed, no new reads will
> use the old address, so it should be safe to wait without holding the
> lock.
> 
> So what about this?

This is wrong too - when you add the exception to the complete hash table 
and drop the lock, writes to the chunk in the origin target are can modify 
the origin volume. These writes race with pending reads to the snapshot 
(__chunk_is_tracked tests for those reads). If the I/O scheduler schedules 
the write to the origin before the read from the snapshot, the read from 
the snapshot returns corrupted data.

There can be more problems with snapshot merging - if the chunk is on the 
complete hash table, the merging code assumes that it can be safely 
merged.

Mikulas

> Thanks for your time,
> NeilBrown
> 
> 
> 
> diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
> index 7c82d3ccce87..c7a7ed2cfd6d 100644
> --- a/drivers/md/dm-snap.c
> +++ b/drivers/md/dm-snap.c
> @@ -1461,14 +1461,22 @@ static void pending_complete(struct dm_snap_pending_exception *pe, int success)
>  		goto out;
>  	}
>  
> -	/* Check for conflicting reads */
> -	__check_for_conflicting_io(s, pe->e.old_chunk);
> +	/* Add proper exception.  Now all reads will be
> +	 * redirected, so no new reads can be started on
> +	 * 'old_chunk'.
> +	 */
> +	dm_insert_exception(&s->complete, e);
> +
> +	/* Drain conflicting reads */
> +	if (__chunk_is_tracked(s, pe->e.old_chunk)) {
> +		up_write(&s->lock);
> +		__check_for_conflicting_io(s, pe->e.old_chunk);
> +		down_write(&s->lock);
> +	}
>  
>  	/*
> -	 * Add a proper exception, and remove the
> -	 * in-flight exception from the list.
> +	 * And now we can remove the temporary exception.
>  	 */
> -	dm_insert_exception(&s->complete, e);
>  
>  out:
>  	dm_remove_exception(&pe->e);
> 




More information about the dm-devel mailing list