[dm-devel] dm-snap deadlock in pending_complete()
Mikulas Patocka
mpatocka at redhat.com
Mon Aug 17 10:55:37 UTC 2015
On Thu, 13 Aug 2015, NeilBrown wrote:
> On Wed, 12 Aug 2015 10:16:17 -0400 (EDT) Mikulas Patocka
> <mpatocka at redhat.com> wrote:
>
> >
> >
> > On Wed, 12 Aug 2015, NeilBrown wrote:
> >
> > > Ahh - thanks for the explanation. That makes sense.
> > > Presumably you need to wait for old reads that you don't have a read
> > > returning old data after a write has confirmed the new data is written?
> > > Is there some other reason?
> > >
> > > As soon as the new "proper" exception is installed, no new reads will
> > > use the old address, so it should be safe to wait without holding the
> > > lock.
> > >
> > > So what about this?
> >
> > This is wrong too - when you add the exception to the complete hash table
> > and drop the lock, writes to the chunk in the origin target are can modify
> > the origin volume. These writes race with pending reads to the snapshot
> > (__chunk_is_tracked tests for those reads). If the I/O scheduler schedules
> > the write to the origin before the read from the snapshot, the read from
> > the snapshot returns corrupted data.
> >
> > There can be more problems with snapshot merging - if the chunk is on the
> > complete hash table, the merging code assumes that it can be safely
> > merged.
> >
>
> Thanks again.
>
> I came up with the following which should address the origin-write
> issue, but I'm not so sure about snapshot merging, and it is becoming a
> lot less simple that I had hoped.
>
> I'll see if I can come up with code for a more general solution that is
> still localised to dm - along the lines of my bio_split suggestion.
>
> Thanks,
> NeilBrown
>
>
>
> diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
> index 7c82d3ccce87..7f9e1b0429c8 100644
> --- a/drivers/md/dm-snap.c
> +++ b/drivers/md/dm-snap.c
> @@ -1461,14 +1461,22 @@ static void pending_complete(struct dm_snap_pending_exception *pe, int success)
> goto out;
> }
>
> - /* Check for conflicting reads */
> - __check_for_conflicting_io(s, pe->e.old_chunk);
> + /* Add proper exception. Now all reads will be
> + * redirected, so no new reads can be started on
> + * 'old_chunk'.
> + */
> + dm_insert_exception(&s->complete, e);
> +
> + /* Drain conflicting reads */
> + if (__chunk_is_tracked(s, pe->e.old_chunk)) {
> + up_write(&s->lock);
> + __check_for_conflicting_io(s, pe->e.old_chunk);
> + down_write(&s->lock);
> + }
>
> /*
> - * Add a proper exception, and remove the
> - * in-flight exception from the list.
> + * And now we can remove the temporary exception.
> */
> - dm_insert_exception(&s->complete, e);
>
> out:
> dm_remove_exception(&pe->e);
> @@ -2089,17 +2097,17 @@ static int __origin_write(struct list_head *snapshots, sector_t sector,
> */
> chunk = sector_to_chunk(snap->store, sector);
>
> - /*
> - * Check exception table to see if block
> - * is already remapped in this snapshot
> - * and trigger an exception if not.
> - */
> - e = dm_lookup_exception(&snap->complete, chunk);
> - if (e)
> - goto next_snapshot;
> -
> pe = __lookup_pending_exception(snap, chunk);
> if (!pe) {
> + /*
> + * Check exception table to see if block
> + * is already remapped in this snapshot
> + * and trigger an exception if not.
> + */
> + e = dm_lookup_exception(&snap->complete, chunk);
> + if (e)
> + goto next_snapshot;
> +
> up_write(&snap->lock);
> pe = alloc_pending_exception(snap);
> down_write(&snap->lock);
You forgot about another race after these calls to up_write/down_write -
after the lock is taken again, you must call both dm_lookup_exception and
__lookup_pending_exception.
Mikulas
More information about the dm-devel
mailing list