[dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held

Guoqing Jiang guoqing.jiang at linux.dev
Tue Dec 14 10:03:43 UTC 2021



On 12/14/21 5:31 PM, Donald Buczek wrote:

>>>>   -void md_reap_sync_thread(struct mddev *mddev)
>>>> +void md_reap_sync_thread(struct mddev *mddev, bool 
>>>> reconfig_mutex_held)
>>>>   {
>>>>       struct md_rdev *rdev;
>>>>       sector_t old_dev_sectors = mddev->dev_sectors;
>>>>       bool is_reshaped = false;
>>>>         /* resync has finished, collect result */
>>>> +    if (reconfig_mutex_held)
>>>> +        mddev_unlock(mddev);
>>>
>>>
>>> If one thread got here, e.g. via action_store( /* "idle" */ ), now 
>>> that the mutex is unlocked, is there anything which would prevent 
>>> another thread getting  here as well, e.g. via the same path?
>>>
>>> If not, they both might call
>>>
>>>> md_unregister_thread(&mddev->sync_thread);
>>>
>>> Which is not reentrant:
>>>
>>> void md_unregister_thread(struct md_thread **threadp)
>>> {
>>>     struct md_thread *thread = *threadp;
>>>     if (!thread)
>>>         return;
>>>     pr_debug("interrupting MD-thread pid %d\n", 
>>> task_pid_nr(thread->tsk));
>>>     /* Locking ensures that mddev_unlock does not wake_up a
>>>      * non-existent thread
>>>      */
>>>     spin_lock(&pers_lock);
>>>     *threadp = NULL;
>>>     spin_unlock(&pers_lock);
>>>
>>>     kthread_stop(thread->tsk);
>>>     kfree(thread);
>>> }
>>>
>>> This might be a preexisting problem, because the call site in 
>>> dm-raid.c, which you updated to `md_reap_sync_thread(mddev, 
>>> false);`, didn't hold the mutex anyway.
>>>
>>> Am I missing something? Probably, I do.
>>>
>>> Otherwise: Move the deref of threadp in md_unregister_thread() into 
>>> the spinlock scope?
>>
>> Good point, I think you are right.
>>
>> And actually pers_lock does extra service to protect accesses to 
>> mddev->thread (I think it
>> also suitable for mddev->sync_thread ) when the mutex can't be held. 
>> Care to send a patch
>> for it?
>
> I'm really sorry, but it's one thing to point to a possible problem 
> and another thing to come up with a correct solution.

Yes, it is often the reality of life, and we can always correct 
ourselves if there is problem 😎.

> While I think it would be easy to avoid the double free with the 
> spinlock (or maybe atomic RMW) , we surely don't want to hold the 
> spinlock while we are sleeping in kthread_stop(). If we don't hold 
> some kind of lock, what are the side effects of another sync thread 
> being started or any other reconfiguration? Are the existing flags 
> enough to protect us from this? If we do want to hold the lock while 
> waiting for the thread to terminate, should it be made into a mutex? 
> If so, it probably shouldn't be static but moved into the mddev 
> structure. I'd need weeks if not month to figure that out and to feel 
> bold enough to post it.

Thanks for deep thinking about it, I think we can avoid to call 
kthread_stop with spinlock
held. Maybe something like this, but just my raw idea, please have a 
thorough review.

  void md_unregister_thread(struct md_thread **threadp)
  {
-       struct md_thread *thread = *threadp;
-       if (!thread)
+       struct md_thread *thread = READ_ONCE(*threadp);
+
+       spin_lock(&pers_lock);
+       if (!thread) {
+               spin_unlock(&pers_lock);
                 return;
+       }
         pr_debug("interrupting MD-thread pid %d\n", 
task_pid_nr(thread->tsk));
         /* Locking ensures that mddev_unlock does not wake_up a
          * non-existent thread
          */
-       spin_lock(&pers_lock);
         *threadp = NULL;
         spin_unlock(&pers_lock);

-       kthread_stop(thread->tsk);
+       if (IS_ERR_OR_NULL(thread->tsk)) {
+               kthread_stop(thread->tsk);
+               thread->tsk = NULL;
+       }
         kfree(thread);
  }
  EXPORT_SYMBOL(md_unregister_thread);

> I don't want to push work to others, but my own my understanding of md 
> is to narrow.

Me either 😉

Thanks,
Guoqing





More information about the dm-devel mailing list