[dm-devel] [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted
Kiyoshi Ueda
k-ueda at ct.jp.nec.com
Wed Jan 20 02:47:56 UTC 2010
Hi Yasui-san,
On 01/20/2010 05:40 AM +0900, Takahiro Yasui wrote:
> Hi,
>
> This is a patch set to fix deadlock on suspending of mirror device.
>
>
> ISSUE
> =====
>
> Suspend procedure on a dm-mirror device could cause deadlock on recovery_count
> semaphore.
>
> When mirror_presuspend is called, recovery_count semaphore is acquired in
> dm_rh_stop_recovery() to stop recovery routine, but when an signal is caught
> in dm_wait_for_completion() or an error occurred in in dm_suspend(),
> the suspend process is interrupted without releasing recovery_count semaphore
> of a mirror device. This means that another suspend is executed, and then
> the suspend process gets stuck at dm_rh_stop_recovery().
>
> When suspend procedure is interrupted, the device should work properly since
> the status of the device is not "suspended."
>
>
> SOLUTION
> ========
>
> Introduce a target handler, cancel_presuspend, to cancel status changes
> done by a target specific presuspend handler.
How about using ->resume as a cancelling method?
Though you have to audit existing targets' ->resume handler,
I think it's better idea than adding another target handler
just for this purpose.
And in your dm-raid1 patch, cancelling log's presuspend which is used
by dm-log-userspace is missed.
So it seems that dm-raid1 can use ->resume to cancel presuspend.
Thanks,
Kiyoshi Ueda
More information about the dm-devel
mailing list