[Cluster-devel] [GFS2 PATCH 0/2 v3] Fix infinite loop in ail1 flush with jdata

Bob Peterson rpeterso at redhat.com
Mon Jan 13 14:04:19 UTC 2020


Hi. This patch set fixes a problem in which gfs2 can become deadlocked
while doing normal IO on jdata files. The problem is best observed by
repeatedly running xfstests generic/269 repeatedly with jdata files.
The specifics of the hang are best described in the second patch.

The first patch reverts e955537e3262de8e56f070b13817f525f472fa00.
The defective patch caused tr->tr_num_revoke to sometimes be a negative
number, since you can remove more revokes than you add. However, since
tr_num_revoke is declared unsigned, it triggered this assert in
gfs2_trans_end:

	if (gfs2_assert_withdraw(sdp, (nbuf <= tr->tr_blocks) &&
			       (tr->tr_num_revoke <= tr->tr_revokes)))

The management of revokes is not very good since we moved them from a
private list to a global list hung off the superblock pointer, sdp.
So we will probably want to revisit this and rework how revokes are
handled. In the meantime, it is safest to just revert the patch until
we can fix it properly.

The second patch fixes an infinite loop deadlock while flushing the
ail1 list for jdata pages. The patch comments describe the problem
and circumstances fairly well.

Bob Peterson (2):
  Revert "gfs2: eliminate tr_num_revoke_rm"
  gfs2: keep a redirty list for jdata pages that are PageChecked in ail1

 fs/gfs2/incore.h |  2 ++
 fs/gfs2/log.c    | 30 +++++++++++++++++++++++++++++-
 fs/gfs2/trans.c  |  7 ++++---
 3 files changed, 35 insertions(+), 4 deletions(-)

-- 
2.24.1




More information about the Cluster-devel mailing list