[Cluster-devel] [GFS2 PATCH 09/15] gfs2: fix deadlock in gfs2_ail1_empty withdraw

Bob Peterson rpeterso at redhat.com
Tue Jul 27 17:37:03 UTC 2021

Before this patch, function gfs2_ail1_empty could issue a file system
withdraw when IO errors were discovered. However, there are several
callers, including gfs2_flush_revokes() which holds the gfs2_log_lock
before calling gfs2_ail1_empty. If gfs2_ail1_empty needed to withdraw
it would leave the gfs2_log_lock held, which resulted in a deadlock
due to other processes that needed the log_lock.

Another problem discovered by Christoph Helwig is that we cannot
withdraw from the log_flush process because it may be called from
the glock workqueue, and the withdraw process waits for that very
workqueue to be flushed. So the withdraw must be ignored until it may
be handled by a more appropriate context like the gfs2_logd daemon.

This patch moves the withdraw out of function gfs2_ail1_empty and
makes each of the callers check for a withdraw by calling new function
check_ail1_withdraw. Function gfs2_flush_revokes now does this check
after releasing the gfs2_log_lock to avoid the deadlock.

Signed-off-by: Bob Peterson <rpeterso at redhat.com>
 fs/gfs2/log.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index f0ee3ff6f9a8..c687a8c4e044 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -364,11 +364,6 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp, int max_revokes)
 	ret = list_empty(&sdp->sd_ail1_list);
-	if (test_bit(SDF_WITHDRAWING, &sdp->sd_flags)) {
-		gfs2_lm(sdp, "fatal: I/O error(s)\n");
-		gfs2_withdraw(sdp);
-	}
 	return ret;
@@ -786,6 +781,15 @@ void gfs2_glock_remove_revoke(struct gfs2_glock *gl)
+static void check_ail1_withdraw(struct gfs2_sbd *sdp)
+	if (!test_bit(SDF_WITHDRAWING, &sdp->sd_flags))
+		return;
+	gfs2_lm(sdp, "fatal: I/O error(s)\n");
+	gfs2_withdraw(sdp);
  * gfs2_flush_revokes - Add as many revokes to the system transaction as we can
  * @sdp: The GFS2 superblock
@@ -1317,6 +1321,7 @@ int gfs2_logd(void *data)
 		if (gfs2_jrnl_flush_reqd(sdp) || t == 0) {
 			gfs2_ail1_empty(sdp, 0);
+			check_ail1_withdraw(sdp);
 			gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
@@ -1325,6 +1330,7 @@ int gfs2_logd(void *data)
 			gfs2_ail1_empty(sdp, 0);
+			check_ail1_withdraw(sdp);
 			gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |

More information about the Cluster-devel mailing list