[dm-devel] dm thin: commit pool's metadata on last close of thin device

Thu May 17 04:00:41 UTC 2012

On Wed, May 16 2012 at  8:43pm -0400,
Mikulas Patocka <mpatocka at redhat.com> wrote:

> 
> 
> On Wed, 16 May 2012, Mike Snitzer wrote:
> 
> > Reinstate dm_flush_all and dm_table_flush_all.  dm_blk_close will
> > now trigger the .flush method of all targets within a table on the last
> > close of a DM device.
> > 
> > In the case of the thin target, the thin_flush method will commit the
> > backing pool's metadata.
> > 
> > Doing so avoids a deadlock that has been observed with the following
> > sequence (as can be triggered via "dmsetup remove_all"):
> > - IO is issued to a thin device, thin device is closed
> > - pool's metadata device is suspended before the pool is
> > - because the pool still has outstanding IO we deadlock because the
> >   pool's metadata device is suspended
> > 
> > Signed-off-by: Mike Snitzer <snitzer at redhat.com>
> > Cc: stable at vger.kernel.org
> 
> I'd say --- don't do this sequence.
> 
> Device mapper generally expects that devices are suspended top-down --- 
> i.e. you should first suspend the thin device and then suspend its 
> underlying data and metadata device. If you violate this sequence and 
> suspend bottom-up, you get deadlocks.
> 
> For example, if dm-mirror is resynchronizing and you suspend the 
> underlying leg or log volume and then suspend dm-mirror, you get a 
> deadlock.
> 
> If dm-snapshot is merging and you suspend the underlying snapshot or 
> origin volume and then suspend snapshot-merget target, you get a deadlock.
> 
> These are not bugs in dm-mirror or dm-snapshot, this is expected behavior. 
> Userspace shouldn't do any bottom-up suspend sequence.
> 
> In the same sense, if you suspend the underlying data or metadata pool and 
> then suspend dm-thin, you get a deadlock too. Fix userspace so that it 
> doesn't do it.

Yeah, I agree.  I told Zdenek it'd be best to just not do it.

'dmsetup remove_all' is a dumb command that invites these deadlocks when
more sophisticated stacking is being used.

But all said, in general I don't have a problem with triggering a target
specific "flush" on device close.

(Though the implementation of thin_flush could be made more
intelligent... as is we can see a pretty good storm of redundant
metadata commits if 100s of thin devices are closed simultaneously --
creating pmd->root_lock write lock contention).