[dm-devel] dm: Fix UAF in run_timer_softirq()
Luo Meng
luomeng at huaweicloud.com
Tue Oct 25 14:00:40 UTC 2022
在 2022/10/20 3:40, Mike Snitzer 写道:
> On Tue, Oct 18 2022 at 4:17P -0400,
> Luo Meng <luomeng at huaweicloud.com> wrote:
>
>>
>>
>> 在 2022/10/18 3:38, Mike Snitzer 写道:
>>> On Mon, Oct 10 2022 at 10:39P -0400,
>>> Luo Meng <luomeng at huaweicloud.com> wrote:
>>>
>>>> From: Luo Meng <luomeng12 at huawei.com>
>>>>
>>>> When dm_resume() and dm_destroy() are concurrent, it will
>>>> lead to UAF.
>>>>
>>>> One of the concurrency UAF can be shown as below:
>>>>
>>>> use free
>>>> do_resume |
>>>> __find_device_hash_cell |
>>>> dm_get |
>>>> atomic_inc(&md->holders) |
>>>> | dm_destroy
>>>> | __dm_destroy
>>>> | if (!dm_suspended_md(md))
>>>> | atomic_read(&md->holders)
>>>> | msleep(1)
>>>> dm_resume |
>>>> __dm_resume |
>>>> dm_table_resume_targets |
>>>> pool_resume |
>>>> do_waker #add delay work |
>>>> | dm_table_destroy
>>>> | pool_dtr
>>>> | __pool_dec
>>>> | __pool_destroy
>>>> | destroy_workqueue
>>>> | kfree(pool) # free pool
>>>> time out
>>>> __do_softirq
>>>> run_timer_softirq # pool has already been freed
>>>>
>>>> This can be easily reproduced using:
>>>> 1. create thin-pool
>>>> 2. dmsetup suspend pool
>>>> 3. dmsetup resume pool
>>>> 4. dmsetup remove_all # Concurrent with 3
>>>>
>>>> The root cause of UAF bugs is that dm_resume() adds timer after
>>>> dm_destroy() skips cancel timer beause of suspend status. After
>>>> timeout, it will call run_timer_softirq(), however pool has already
>>>> been freed. The concurrency UAF bug will happen.
>>>>
>>>> Therefore, canceling timer is moved after md->holders is zero.
>>>>
>>>> Signed-off-by: Luo Meng <luomeng12 at huawei.com>
>>>> ---
>>>> drivers/md/dm.c | 26 +++++++++++++-------------
>>>> 1 file changed, 13 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>>>> index 60549b65c799..379525313628 100644
>>>> --- a/drivers/md/dm.c
>>>> +++ b/drivers/md/dm.c
>>>> @@ -2420,6 +2420,19 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
>>>> blk_mark_disk_dead(md->disk);
>>>> + /*
>>>> + * Rare, but there may be I/O requests still going to complete,
>>>> + * for example. Wait for all references to disappear.
>>>> + * No one should increment the reference count of the mapped_device,
>>>> + * after the mapped_device state becomes DMF_FREEING.
>>>> + */
>>>> + if (wait)
>>>> + while (atomic_read(&md->holders))
>>>> + msleep(1);
>>>> + else if (atomic_read(&md->holders))
>>>> + DMWARN("%s: Forcibly removing mapped_device still in use! (%d users)",
>>>> + dm_device_name(md), atomic_read(&md->holders));
>>>> +
>>>> /*
>>>> * Take suspend_lock so that presuspend and postsuspend methods
>>>> * do not race with internal suspend.
>>>> @@ -2436,19 +2449,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
>>>> dm_put_live_table(md, srcu_idx);
>>>> mutex_unlock(&md->suspend_lock);
>>>> - /*
>>>> - * Rare, but there may be I/O requests still going to complete,
>>>> - * for example. Wait for all references to disappear.
>>>> - * No one should increment the reference count of the mapped_device,
>>>> - * after the mapped_device state becomes DMF_FREEING.
>>>> - */
>>>> - if (wait)
>>>> - while (atomic_read(&md->holders))
>>>> - msleep(1);
>>>> - else if (atomic_read(&md->holders))
>>>> - DMWARN("%s: Forcibly removing mapped_device still in use! (%d users)",
>>>> - dm_device_name(md), atomic_read(&md->holders));
>>>> -
>>>> dm_table_destroy(__unbind(md));
>>>> free_dev(md);
>>>> }
>>>> --
>>>> 2.31.1
>>>>
>>>
>>> Thanks for the report but your fix seems wrong. A thin-pool specific
>>> fix seems much more appropriate. Does this fix the issue?
>>>
>>> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
>>> index e76c96c760a9..dc271c107fb5 100644
>>> --- a/drivers/md/dm-thin.c
>>> +++ b/drivers/md/dm-thin.c
>>> @@ -2889,6 +2889,8 @@ static void __pool_destroy(struct pool *pool)
>>> dm_bio_prison_destroy(pool->prison);
>>> dm_kcopyd_client_destroy(pool->copier);
>>> + cancel_delayed_work_sync(&pool->waker);
>>> + cancel_delayed_work_sync(&pool->no_space_timeout);
>>> if (pool->wq)
>>> destroy_workqueue(pool->wq);
>>>
>> Thanks for your reply.
>>
>> However this issue exits not only in thin-pool, cache_target also has
>> thisissus. Generic fix maybe more appropriate.
>
> No, you've pointed out a problem with dm-thinp not properly tearing
> down its resources. Same for dm-cache. So a generic fix doesn't make
> sense. And your particular generic fix merely papers over the
> resource leak, while also causing DM core to no longer properly wait
> for outstanding IO to complete.
>
>> After adding cancel_delayed_work_sync() in __pool_destroy(), this will make
>> it possible to call cancel_delayed_work_sync(&pool->waker) twice when dm is
>> not suspend.
>
> That's not a problem, that's the cost of properly accounting for
> resources where we must.
>
Thanks.
I will send v2 soon.
More information about the dm-devel
mailing list