[dm-devel] dm thin: do not queue freed thin mapping for next stage processing
Mike Snitzer
snitzer at redhat.com
Mon Jun 26 19:06:38 UTC 2017
On Fri, Jun 23 2017 at 2:53pm -0400,
Vallish Vaidyeshwara <vallish at amazon.com> wrote:
> process_prepared_discard_passdown_pt1() should cleanup
> dm_thin_new_mapping in cases of error.
>
> dm_pool_inc_data_range() can fail trying to get a block reference:
>
> metadata operation 'dm_pool_inc_data_range' failed: error = -61
>
> When dm_pool_inc_data_range() fails, dm thin aborts current metadata
> transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
> is released as well. However, current thin mapping will be queued
> onto next stage as part of queue_passdown_pt2() or passdown_endio().
> This dangling thin mapping memory when processed and accessed in
> next stage will lead to device mapper crashing.
>
> Code flow without fix:
> -> process_prepared_discard_passdown_pt1(m)
> -> dm_thin_remove_range()
> -> discard passdown
> --> passdown_endio(m) queues m onto next stage
> -> dm_pool_inc_data_range() fails, frees memory m
> but does not remove it from next stage queue
>
> -> process_prepared_discard_passdown_pt2(m)
> -> processes freed memory m and crashes
>
> One such stack:
>
> Call Trace:
> [<ffffffffa037a46f>] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
> [<ffffffffa039b6dc>] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
> [<ffffffffa039b88b>] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool]
> [<ffffffffa0399611>] process_prepared+0x81/0xa0 [dm_thin_pool]
> [<ffffffffa039e735>] do_worker+0xc5/0x820 [dm_thin_pool]
> [<ffffffff8152bf54>] ? __schedule+0x244/0x680
> [<ffffffff81087e72>] ? pwq_activate_delayed_work+0x42/0xb0
> [<ffffffff81089f53>] process_one_work+0x153/0x3f0
> [<ffffffff8108a71b>] worker_thread+0x12b/0x4b0
> [<ffffffff8108a5f0>] ? rescuer_thread+0x350/0x350
> [<ffffffff8108fd6a>] kthread+0xca/0xe0
> [<ffffffff8108fca0>] ? kthread_park+0x60/0x60
> [<ffffffff81530b45>] ret_from_fork+0x25/0x30
>
> The fix is to first take the block ref count for discarded block and
> then do a passdown discard of this block. If block ref count fails,
> then bail out aborting current metadata transaction, mark pool as
> PM_READ_ONLY and also free current thin mapping memory (existing error
> handling code) without queueing this thin mapping onto next stage of
> processing. If block ref count succeeds, then passdown discard of this
> block. Discard callback of passdown_endio() will queue this thin mapping
> onto next stage of processing.
>
> Code flow with fix:
> -> process_prepared_discard_passdown_pt1(m)
> -> dm_thin_remove_range()
> -> dm_pool_inc_data_range()
> --> if fails, free memory m and bail out
> -> discard passdown
> --> passdown_endio(m) queues m onto next stage
>
> Cc: stable <stable at vger.kernel.org> # v4.9+
> Reviewed-by: Eduardo Valentin <eduval at amazon.com>
> Reviewed-by: Cristian Gafton <gafton at amazon.com>
> Reviewed-by: Anchal Agarwal <anchalag at amazon.com>
> Signed-off-by: Vallish Vaidyeshwara <vallish at amazon.com>
> ---
> drivers/md/dm-thin.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 17ad50d..28808e5 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1094,6 +1094,19 @@ static void process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m)
> return;
> }
>
> + /*
> + * Increment the unmapped blocks. This prevents a race between the
> + * passdown io and reallocation of freed blocks.
> + */
> + r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end);
> + if (r) {
> + metadata_operation_failed(pool, "dm_pool_inc_data_range", r);
> + bio_io_error(m->bio);
> + cell_defer_no_holder(tc, m->cell);
> + mempool_free(m, pool->mapping_pool);
> + return;
> + }
> +
> discard_parent = bio_alloc(GFP_NOIO, 1);
> if (!discard_parent) {
> DMWARN("%s: unable to allocate top level discard bio for passdown. Skipping passdown.",
> @@ -1114,19 +1127,6 @@ static void process_prepared_discard_passdown_pt1(struct dm_thin_new_mapping *m)
> end_discard(&op, r);
> }
> }
> -
> - /*
> - * Increment the unmapped blocks. This prevents a race between the
> - * passdown io and reallocation of freed blocks.
> - */
> - r = dm_pool_inc_data_range(pool->pmd, m->data_block, data_end);
> - if (r) {
> - metadata_operation_failed(pool, "dm_pool_inc_data_range", r);
> - bio_io_error(m->bio);
> - cell_defer_no_holder(tc, m->cell);
> - mempool_free(m, pool->mapping_pool);
> - return;
> - }
> }
>
> static void process_prepared_discard_passdown_pt2(struct dm_thin_new_mapping *m)
This looks like a good fix.
But I'll wait for Joe's confirmation before staging for 4.12 final
inclusion later this week.
Thanks,
Mike
More information about the dm-devel
mailing list