[dm-devel] [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held

Mikko Rantalainen mikko.rantalainen at peda.net
Tue Jan 11 12:25:04 UTC 2022


Guoqing Jiang (2021-02-13 02:49 Europe/Helsinki):
> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> doesn't reconfigure array.
> 
> And it could cause deadlock problem for raid5 as follows:
> 
> 1. process A tried to reap sync thread with reconfig_mutex held after echo
>    idle to sync_action.
> 2. raid5 sync thread was blocked if there were too many active stripes.
> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
>    which causes the number of active stripes can't be decreased.
> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
>    to hold reconfig_mutex.
> 
> More details in the link:
> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t

I don't understand md well enough to suggest a patch but isn't this
logically a classic two lock deadlock problem where

thread 1:
- lock reconfig_mutex
- blocked for sync that requires SB_CHANGE_PENDING

thread 2:
- (logically) acquire lock SB_CHANGE_PENDING
- blocked for reconfig_mutex before SB_CHANGE_PENDING can be released

?

The classic fix for this kind of deadlock is to require these locks to
be always acquired in constant order and released in reverse order.

If you had a rule that SB_CHANGE_PENDING cannot be set or cleared
without already having reconfig_mutex, wouldn't that prevent this
deadlock? (If I understood the issue correctly, it's currently possible
to set but not clear the SB_CHANGE_PENDING without having reconfig_mutex.)

Another possibility is to expect SB_CHANGE_PENDING to be set as part of
sync process required change to "idle" (write to sync_action). In that
case the logic would be you cannot have reconfig_mutex already locked
before setting (logically acquiring lock) SB_CHANGE_PENDING. So the
transfer from active to idle would require first setting
SB_CHANGE_PENDING, doing the required processing (getting and freeing
reconfig_mutex in process) and then clearing SB_CHANGE_PENDING.
Basically the rule would be you must lock SB_CHANGE_PENDING before you
can lock reconfig_mutex.

If I've understood correctly SB_CHANGE_PENDING is not technically a lock
but it's logically used like it were a lock.

Would either of these make sense for the overall design?

Obviously, if it doesn't hurt the performance too much, using a single
lock for everything that needs to be serialized would be much easier.

-- 
Mikko




More information about the dm-devel mailing list