[dm-devel] corruption causing crash in __queue_work

Mike Snitzer snitzer at redhat.com
Mon Dec 14 15:31:47 UTC 2015


On Mon, Dec 14 2015 at  3:41P -0500,
Nikolay Borisov <kernel at kyup.com> wrote:
 
> Had another poke at the backtrace that is produced and here what the
> delayed_work looks like:
> 
> crash> struct delayed_work ffff88036772c8c0
> struct delayed_work {
>   work = {
>     data = {
>       counter = 1537
>     },
>     entry = {
>       next = 0xffff88036772c8c8,
>       prev = 0xffff88036772c8c8
>     },
>     func = 0xffffffffa0211a30 <do_waker>
>   },
>   timer = {
>     entry = {
>       next = 0x0,
>       prev = 0xdead000000200200
>     },
>     expires = 4349463655,
>     base = 0xffff88047fd2d602,
>     function = 0xffffffff8106da40 <delayed_work_timer_fn>,
>     data = 18446612146934696128,
>     slack = -1,
>     start_pid = -1,
>     start_site = 0x0,
>     start_comm =
> "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
>   },
>   wq = 0xffff88030cf65400,
>   cpu = 21
> }
> 
> From this it seems that the timer is also cancelled/expired judging by
> the values in timer -> entry. But then again in dm-thin the pool is
> first suspended, which implies the following functions were called:
> 
> cancel_delayed_work(&pool->waker);
> cancel_delayed_work(&pool->no_space_timeout);
> flush_workqueue(pool->wq);
> 
> so at that point dm-thin's workqueue should be empty and it shouldn't be
> possible to queue any more delayed work. But the crashdump clearly shows
> that the opposite is happening. So far all of this points to a race
> condition and inserting some sleeps after umount and after vgchange -Kan
> (command to disable volume group and suspend, so the cancel_delayed_work
> is invoked) seems to reduce the frequency of crashes, though it doesn't
> eliminate them.

'vgchange -Kan' doesn't suspend the pool before it destroys the device.
So the cancel_delayed_work()s you referenced aren't applicable.

Can you try this patch?

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 63903a5..b201d887 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2750,8 +2750,11 @@ static void __pool_destroy(struct pool *pool)
 	dm_bio_prison_destroy(pool->prison);
 	dm_kcopyd_client_destroy(pool->copier);
 
-	if (pool->wq)
+	if (pool->wq) {
+		cancel_delayed_work(&pool->waker);
+		cancel_delayed_work(&pool->no_space_timeout);
 		destroy_workqueue(pool->wq);
+	}
 
 	if (pool->next_mapping)
 		mempool_free(pool->next_mapping, pool->mapping_pool);




More information about the dm-devel mailing list