[dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held
Guoqing Jiang
guoqing.jiang at linux.dev
Tue Dec 14 10:03:43 UTC 2021
On 12/14/21 5:31 PM, Donald Buczek wrote:
>>>> -void md_reap_sync_thread(struct mddev *mddev)
>>>> +void md_reap_sync_thread(struct mddev *mddev, bool
>>>> reconfig_mutex_held)
>>>> {
>>>> struct md_rdev *rdev;
>>>> sector_t old_dev_sectors = mddev->dev_sectors;
>>>> bool is_reshaped = false;
>>>> /* resync has finished, collect result */
>>>> + if (reconfig_mutex_held)
>>>> + mddev_unlock(mddev);
>>>
>>>
>>> If one thread got here, e.g. via action_store( /* "idle" */ ), now
>>> that the mutex is unlocked, is there anything which would prevent
>>> another thread getting here as well, e.g. via the same path?
>>>
>>> If not, they both might call
>>>
>>>> md_unregister_thread(&mddev->sync_thread);
>>>
>>> Which is not reentrant:
>>>
>>> void md_unregister_thread(struct md_thread **threadp)
>>> {
>>> struct md_thread *thread = *threadp;
>>> if (!thread)
>>> return;
>>> pr_debug("interrupting MD-thread pid %d\n",
>>> task_pid_nr(thread->tsk));
>>> /* Locking ensures that mddev_unlock does not wake_up a
>>> * non-existent thread
>>> */
>>> spin_lock(&pers_lock);
>>> *threadp = NULL;
>>> spin_unlock(&pers_lock);
>>>
>>> kthread_stop(thread->tsk);
>>> kfree(thread);
>>> }
>>>
>>> This might be a preexisting problem, because the call site in
>>> dm-raid.c, which you updated to `md_reap_sync_thread(mddev,
>>> false);`, didn't hold the mutex anyway.
>>>
>>> Am I missing something? Probably, I do.
>>>
>>> Otherwise: Move the deref of threadp in md_unregister_thread() into
>>> the spinlock scope?
>>
>> Good point, I think you are right.
>>
>> And actually pers_lock does extra service to protect accesses to
>> mddev->thread (I think it
>> also suitable for mddev->sync_thread ) when the mutex can't be held.
>> Care to send a patch
>> for it?
>
> I'm really sorry, but it's one thing to point to a possible problem
> and another thing to come up with a correct solution.
Yes, it is often the reality of life, and we can always correct
ourselves if there is problem 😎.
> While I think it would be easy to avoid the double free with the
> spinlock (or maybe atomic RMW) , we surely don't want to hold the
> spinlock while we are sleeping in kthread_stop(). If we don't hold
> some kind of lock, what are the side effects of another sync thread
> being started or any other reconfiguration? Are the existing flags
> enough to protect us from this? If we do want to hold the lock while
> waiting for the thread to terminate, should it be made into a mutex?
> If so, it probably shouldn't be static but moved into the mddev
> structure. I'd need weeks if not month to figure that out and to feel
> bold enough to post it.
Thanks for deep thinking about it, I think we can avoid to call
kthread_stop with spinlock
held. Maybe something like this, but just my raw idea, please have a
thorough review.
void md_unregister_thread(struct md_thread **threadp)
{
- struct md_thread *thread = *threadp;
- if (!thread)
+ struct md_thread *thread = READ_ONCE(*threadp);
+
+ spin_lock(&pers_lock);
+ if (!thread) {
+ spin_unlock(&pers_lock);
return;
+ }
pr_debug("interrupting MD-thread pid %d\n",
task_pid_nr(thread->tsk));
/* Locking ensures that mddev_unlock does not wake_up a
* non-existent thread
*/
- spin_lock(&pers_lock);
*threadp = NULL;
spin_unlock(&pers_lock);
- kthread_stop(thread->tsk);
+ if (IS_ERR_OR_NULL(thread->tsk)) {
+ kthread_stop(thread->tsk);
+ thread->tsk = NULL;
+ }
kfree(thread);
}
EXPORT_SYMBOL(md_unregister_thread);
> I don't want to push work to others, but my own my understanding of md
> is to narrow.
Me either 😉
Thanks,
Guoqing
More information about the dm-devel
mailing list