[dm-devel] dm: Fix UAF in run_timer_softirq()
Mike Snitzer
snitzer at redhat.com
Mon Oct 17 19:38:32 UTC 2022
On Mon, Oct 10 2022 at 10:39P -0400,
Luo Meng <luomeng at huaweicloud.com> wrote:
> From: Luo Meng <luomeng12 at huawei.com>
>
> When dm_resume() and dm_destroy() are concurrent, it will
> lead to UAF.
>
> One of the concurrency UAF can be shown as below:
>
> use free
> do_resume |
> __find_device_hash_cell |
> dm_get |
> atomic_inc(&md->holders) |
> | dm_destroy
> | __dm_destroy
> | if (!dm_suspended_md(md))
> | atomic_read(&md->holders)
> | msleep(1)
> dm_resume |
> __dm_resume |
> dm_table_resume_targets |
> pool_resume |
> do_waker #add delay work |
> | dm_table_destroy
> | pool_dtr
> | __pool_dec
> | __pool_destroy
> | destroy_workqueue
> | kfree(pool) # free pool
> time out
> __do_softirq
> run_timer_softirq # pool has already been freed
>
> This can be easily reproduced using:
> 1. create thin-pool
> 2. dmsetup suspend pool
> 3. dmsetup resume pool
> 4. dmsetup remove_all # Concurrent with 3
>
> The root cause of UAF bugs is that dm_resume() adds timer after
> dm_destroy() skips cancel timer beause of suspend status. After
> timeout, it will call run_timer_softirq(), however pool has already
> been freed. The concurrency UAF bug will happen.
>
> Therefore, canceling timer is moved after md->holders is zero.
>
> Signed-off-by: Luo Meng <luomeng12 at huawei.com>
> ---
> drivers/md/dm.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 60549b65c799..379525313628 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -2420,6 +2420,19 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
>
> blk_mark_disk_dead(md->disk);
>
> + /*
> + * Rare, but there may be I/O requests still going to complete,
> + * for example. Wait for all references to disappear.
> + * No one should increment the reference count of the mapped_device,
> + * after the mapped_device state becomes DMF_FREEING.
> + */
> + if (wait)
> + while (atomic_read(&md->holders))
> + msleep(1);
> + else if (atomic_read(&md->holders))
> + DMWARN("%s: Forcibly removing mapped_device still in use! (%d users)",
> + dm_device_name(md), atomic_read(&md->holders));
> +
> /*
> * Take suspend_lock so that presuspend and postsuspend methods
> * do not race with internal suspend.
> @@ -2436,19 +2449,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
> dm_put_live_table(md, srcu_idx);
> mutex_unlock(&md->suspend_lock);
>
> - /*
> - * Rare, but there may be I/O requests still going to complete,
> - * for example. Wait for all references to disappear.
> - * No one should increment the reference count of the mapped_device,
> - * after the mapped_device state becomes DMF_FREEING.
> - */
> - if (wait)
> - while (atomic_read(&md->holders))
> - msleep(1);
> - else if (atomic_read(&md->holders))
> - DMWARN("%s: Forcibly removing mapped_device still in use! (%d users)",
> - dm_device_name(md), atomic_read(&md->holders));
> -
> dm_table_destroy(__unbind(md));
> free_dev(md);
> }
> --
> 2.31.1
>
Thanks for the report but your fix seems wrong. A thin-pool specific
fix seems much more appropriate. Does this fix the issue?
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index e76c96c760a9..dc271c107fb5 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2889,6 +2889,8 @@ static void __pool_destroy(struct pool *pool)
dm_bio_prison_destroy(pool->prison);
dm_kcopyd_client_destroy(pool->copier);
+ cancel_delayed_work_sync(&pool->waker);
+ cancel_delayed_work_sync(&pool->no_space_timeout);
if (pool->wq)
destroy_workqueue(pool->wq);
More information about the dm-devel
mailing list