[dm-devel] [PATCH for-dm-3.14-fixes 4/8] dm thin: error out I/O if inappropriate for it to be retried

Fri Feb 21 02:56:01 UTC 2014

If the pool is in fail mode, error_if_no_space is enabled or the
metadata space is exhausted do _not_ allow IO to be retried.  This
change complements commit 8c0f0e8c9f0 ("dm thin: requeue bios to DM core
if no_free_space and in read-only mode").

Also, update Documentation to include information about when the thin
provisioning target commits metadata and how it deals with running out
of space.

Signed-off-by: Mike Snitzer <snitzer at redhat.com>
---
 Documentation/device-mapper/cache.txt             | 11 +++++------
 Documentation/device-mapper/thin-provisioning.txt | 23 +++++++++++++++++++++++
 drivers/md/dm-thin.c                              | 14 +++++++++++++-
 3 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt
index e6b72d3..68c0f51 100644
--- a/Documentation/device-mapper/cache.txt
+++ b/Documentation/device-mapper/cache.txt
@@ -124,12 +124,11 @@ the default being 204800 sectors (or 100MB).
 Updating on-disk metadata
 -------------------------
 
-On-disk metadata is committed every time a REQ_SYNC or REQ_FUA bio is
-written.  If no such requests are made then commits will occur every
-second.  This means the cache behaves like a physical disk that has a
-write cache (the same is true of the thin-provisioning target).  If
-power is lost you may lose some recent writes.  The metadata should
-always be consistent in spite of any crash.
+On-disk metadata is committed every time a FLUSH or FUA bio is written.
+If no such requests are made then commits will occur every second.  This
+means the cache behaves like a physical disk that has a volatile write
+cache.  If power is lost you may lose some recent writes.  The metadata
+should always be consistent in spite of any crash.
 
 The 'dirty' state for a cache block changes far too frequently for us
 to keep updating it on the fly.  So we treat it as a hint.  In normal
diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt
index 8a7a3d4..3989dd6 100644
--- a/Documentation/device-mapper/thin-provisioning.txt
+++ b/Documentation/device-mapper/thin-provisioning.txt
@@ -116,6 +116,29 @@ Resuming a device with a new table itself triggers an event so the
 userspace daemon can use this to detect a situation where a new table
 already exceeds the threshold.
 
+A low water mark for the metadata device is maintained in the kernel and
+will trigger a dm event if free space on the metadata device drops below
+it.
+
+Updating on-disk metadata
+-------------------------
+
+On-disk metadata is committed every time a FLUSH or FUA bio is written.
+If no such requests are made then commits will occur every second.  This
+means the thin-provisioning target behaves like a physical disk that has
+a volatile write cache.  If power is lost you may lose some recent
+writes.  The metadata should always be consistent in spite of any crash.
+
+If data space is exhausted the pool will either error or queue IO
+according to the configuration (see: error_if_no_space).  When metadata
+space is exhausted the pool will error IO, that requires new pool block
+allocation, until the pool's metadata device is resized.  When either the
+data or metadata space is exhausted the current metadata transaction
+must be aborted.  Given that the pool will cache IO whose completion may
+have already been acknowledged to the upper IO layers (e.g. filesystem)
+it is strongly suggested that those layers perform consistency checks
+before the data or metadata space is resized after having been exhausted.
+
 Thin provisioning
 -----------------
 
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 8e68831..bc52b3b 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -989,6 +989,13 @@ static void retry_on_resume(struct bio *bio)
 	spin_unlock_irqrestore(&pool->lock, flags);
 }
 
+static bool should_error_unserviceable_bio(struct pool *pool)
+{
+	return (unlikely(get_pool_mode(pool) == PM_FAIL) ||
+		pool->pf.error_if_no_space ||
+		dm_pool_is_metadata_out_of_space(pool->pmd));
+}
+
 static void handle_unserviceable_bio(struct pool *pool, struct bio *bio)
 {
 	/*
@@ -997,7 +1004,7 @@ static void handle_unserviceable_bio(struct pool *pool, struct bio *bio)
 	 */
 	WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
 
-	if (pool->pf.error_if_no_space)
+	if (should_error_unserviceable_bio(pool))
 		bio_io_error(bio);
 	else
 		retry_on_resume(bio);
@@ -1008,6 +1015,11 @@ static void retry_bios_on_resume(struct pool *pool, struct dm_bio_prison_cell *c
 	struct bio *bio;
 	struct bio_list bios;
 
+	if (should_error_unserviceable_bio(pool)) {
+		cell_error(pool, cell);
+		return;
+	}
+
 	bio_list_init(&bios);
 	cell_release(pool, cell, &bios);
 
-- 
1.8.3.1