[dm-devel] [PATCH] dm-writecache: fix a crash when unloading

Mikulas Patocka mpatocka at redhat.com
Wed Feb 12 18:44:07 UTC 2020



On Wed, 12 Feb 2020, John Dorminy wrote:

> > Also, the test "!dm_suspended(wc->ti)" in writecache_writeback is not
> > sufficient, because dm_suspended returns zero while writecache_suspend is
> > in progress. We add a variable wc->suspending and set it in
> > writecache_suspend. Without this variable, drain_workqueue would spit
> > warnings:
> > workqueue writecache-writeback: drain_workqueue() isn't complete after 10 tries
> > workqueue writecache-writeback: drain_workqueue() isn't complete after 100 tries
> > workqueue writecache-writeback: drain_workqueue() isn't complete after 200 tries
> > workqueue writecache-writeback: drain_workqueue() isn't complete after 300 tries
> > workqueue writecache-writeback: drain_workqueue() isn't complete after 400 tries
> 
> This I don't understand.
> 
> writecache_suspend is a postsuspend function.
> 
> Here's a partial call chain representing suspending a table:
> dm_suspend()
>   __dm_suspend(...,DMF_SUSPENDED)
>     ...do suspend stuff...
>     set_bit(dmf_suspended_flags, ...) // DMF_SUSPENDED
>   dm_table_postsuspend_targets()
>     // for each segment, call that segments' postsuspend function
> 
> And dm_suspended() calls dm_suspended_md() which checks whether
> DMF_SUSPENDED is set.
> 
> So, by the time the targets' postsuspend function gets called,
> dm_suspended() should be returning 1, and the existing conditional
> should be preventing requeuing. So I worry there's something deeper
> going on here...

When the device is being deleted, the postsuspend function is called 
without the DMF_SUSPENDED flag - see the function __dm_destroy. We could 
test for DMF_DELETING or DMF_FREEING, but there is no exported function 
that checks them.

Should we set DMF_SUSPENDED when deleting the device? Perhaps yes, but it 
could change behavior of other targets.

> Additionally, I don't think wc->suspending is multithread safe -- it's
> getting set on one thread and getting checked on another thread,
> right? So the CPU running the workqueue thread is free to read
> wc->suspending and then later on write it to the value it read, even
> if the other thread has attempted to write a different value in
> between.

In all three cases where wc->suspending is accessed, we hold the 
writecache lock.

Mikulas




More information about the dm-devel mailing list