[Cluster-devel] [PATCH 2/2] GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads

Mon Oct 8 20:10:01 UTC 2018

----- Original Message -----
> From: Tim Smith <tim.smith at citrix.com>
> 
> Flushing the workqueue can cause operations to happen which might
> call gfs2_log_reserve(), or get stuck waiting for locks taken by such
> operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
> will never wake because the only thing which can wake it is gfs2_logd()
> which was already stopped.
> 
> This causes umount of a gfs2 filesystem to wedge permanently if, for
> example, the umount immediately follows a large delete operation.
> 
> When this occured, the following stack trace was obtained from the
> umount command
> 
> [<ffffffff81087968>] flush_workqueue+0x1c8/0x520
> [<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
> [<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
> [<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
> [<ffffffff811b7ff7>] kill_block_super+0x27/0x70
> [<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
> [<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
> [<ffffffff811b79b9>] deactivate_super+0x59/0x60
> [<ffffffff811d2998>] cleanup_mnt+0x58/0x80
> [<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
> [<ffffffff8108c87d>] task_work_run+0x7d/0xa0
> [<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
> [<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
> [<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Signed-off-by: Tim Smith <tim.smith at citrix.com>
> Signed-off-by: Mark Syms <mark.syms at citrix.com>
> ---
Hi Mark, Tim, and all,

I pushed patch 2/2 upstream. For now I'll hold off on 1/2 but keep it
on my queue, pending our investigation.
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=b7f5a2cd27b76e96fdc6d77b060dfdd877c9d0a9

Regards,

Bob Peterson