[dm-devel] dm thin-volume hung as swap: bug or as-design ?

Fri Jan 29 10:40:06 UTC 2021

Hi folks,

Recently I receive a report that whole system hung and no response after
a while with I/O load. The special configuration is the dm thin-pool
volume is used as the swap partition of the system.

>From the crash dump, I find one task is suspicious which looks as following,

PID: 462    TASK: ffff93033d74a680  CPU: 7   COMMAND: "kworker/u256:1"
 #0 [ffffb24b4d9c3710] __schedule at ffffffff9e29dc3d
 #1 [ffffb24b4d9c37a0] schedule at ffffffff9e29e0bf
 #2 [ffffb24b4d9c37b0] schedule_timeout at ffffffff9e2a179d
 #3 [ffffb24b4d9c3828] wait_for_completion at ffffffff9e29eaaa
 #4 [ffffb24b4d9c3878] __flush_work at ffffffff9dabb277
 #5 [ffffb24b4d9c38f0] drain_all_pages at ffffffff9dc74e05
 #6 [ffffb24b4d9c3920] __alloc_pages_slowpath at ffffffff9dc77279
 #7 [ffffb24b4d9c3a20] __alloc_pages_nodemask at ffffffff9dc77e41
 #8 [ffffb24b4d9c3a80] new_slab at ffffffff9dc99c1a
 #9 [ffffb24b4d9c3ae8] ___slab_alloc at ffffffff9dc9c6d9
#10 [ffffb24b4d9c3b40] exit_shadow_spine at ffffffffc08ef8cf
[dm_persistent_data]
#11 [ffffb24b4d9c3b50] insert at ffffffffc08edfcc [dm_persistent_data]
#12 [ffffb24b4d9c3c30] sm_ll_mutate at ffffffffc08ea20e [dm_persistent_data]
#13 [ffffb24b4d9c3cd8] dm_kcopyd_zero at ffffffffc03f7a39 [dm_mod]
#14 [ffffb24b4d9c3ce8] schedule_zero at ffffffffc093d181 [dm_thin_pool]
#15 [ffffb24b4d9c3d40] process_cell at ffffffffc093d78c [dm_thin_pool]
#16 [ffffb24b4d9c3dc8] do_worker at ffffffffc093dce6 [dm_thin_pool]
#17 [ffffb24b4d9c3e98] process_one_work at ffffffff9daba4d4
#18 [ffffb24b4d9c3ed8] worker_thread at ffffffff9daba6ed
#19 [ffffb24b4d9c3f10] kthread at ffffffff9dac0a2d
#20 [ffffb24b4d9c3f50] ret_from_fork at ffffffff9e400202

This task is writing on a thin-pool volume which is mounted as swap
partition in the system. This is very suspicious, because I see the
dm-thin code, all memory allocation inside from dm-thin code has
explicity GFP_NOIO/GFP_NOFS or implict memalloc_noio_save(), in order to
avoid deadlock in recursive memory reclaim code path.

I do many testings, and confirm such issue can be reproduced in latest
upstream Linux v5.11-rc5+ kernel. If I create two thin-pool volumes, one
is mounted as swap, one is written by heavy I/O pressure. If anonymous
pages swapping happens on the first thin-pool volume while I/O hitting
on second thin-pool, after around 3 minutes the whole system gets hung
and no any response and kernel information for 1 hour+ before I reset
the machine.

My questions are,
- Can a thin-pool volume be used as swap device?
- The above description is a bug, or an already know issue which should
be avoided ?

Thanks in advance.

Coly Li