[dm-devel] staged dm_internal_{suspend, resume} related changes for wider review

Mike Snitzer snitzer at redhat.com
Wed Nov 5 16:56:11 UTC 2014


On Wed, Nov 05 2014 at 11:10am -0500,
Mike Snitzer <snitzer at redhat.com> wrote:

> On Wed, Nov 05 2014 at  9:37am -0500,
> Mikulas Patocka <mpatocka at redhat.com> wrote:
> 
> > 
> > 
> > On Wed, 5 Nov 2014, Mike Snitzer wrote:
> > 
> > > On Wed, Nov 05 2014 at  8:05am -0500,
> > > Mikulas Patocka <mpatocka at redhat.com> wrote:
> > > 
> > > > On Wed, 5 Nov 2014, Mikulas Patocka wrote:
> > > > 
> > > > > You can for example set the flag in the prison meaning that the prison is 
> > > > > suspended and then call dm_internal_suspend immediatelly followed by 
> > > > > dm_internal_resume - that will clear in-progress bios and prevent new bios 
> > > > > from coming in (and we don't need to change dm_internal_suspend and 
> > > > > dm_internal_resume to become so big).
> > > 
> > > It may _seem_ like they have gotten big given the code was refactored to
> > > share code with dm_suspend and dm_resume.  BUT I know you see that the
> > > actual code complexity isn't big.  I especially wanted you (and/or Bryn)
> > > to evaluate the performance implications that my changes had on
> > > dm-stats.  I'm pretty confident there won't be much if any performance
> > > difference (given the code is identical to what you had, except some
> > > extra checks are made but ultimately not used, e.g. lockfs/unlockfs).
> > 
> > This is not about performance, it is about unclear behavior.
> > 
> > If someone does internal_suspend followed by remove, what should be the 
> > correct behavior? The current code deadlocks in this case.
> 
> You need to be specific, if internal suspend was used by the thin-pool
> suspend to suspend thin devices you'll need the thin-pool resumed before
> you can remove any thin device!
> 
> Like any interface there is a right way and a wrong way to use it.
> dm_internal_suspend must always be followed by dm_internal_resume.
> I cannot yet see a hole where properly written code is exposed here.

But thinking further about what you said, you're correctly concerned
about the potential for dm-stats to have used dm_internal_suspend and
then someone attempting to remove that device while it is internally
suspended.

We just need a patch like this:

 drivers/md/dm.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ac0dcad..ffa6763 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -448,6 +448,17 @@ int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only
 {
 	int r = 0;
 
+retry:
+	mutex_lock(&md->suspend_lock);
+	if (dm_suspended_internally_md(md)) {
+		/* already internally suspended, wait for internal resume */
+		mutex_unlock(&md->suspend_lock);
+		r = wait_on_bit(&md->flags, DMF_SUSPENDED_INTERNALLY, TASK_INTERRUPTIBLE);
+		if (r)
+			return r;
+		goto retry;
+	}
+
 	spin_lock(&_minor_lock);
 
 	if (dm_open_count(md)) {
@@ -461,6 +472,8 @@ int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only
 
 	spin_unlock(&_minor_lock);
 
+	mutex_unlock(&md->suspend_lock);
+
 	return r;
 }
 




More information about the dm-devel mailing list