[Cluster-devel] [PATCH 10/19] gfs2: Check for log write errors before telling dlm to unlock

Wed Mar 27 12:35:23 UTC 2019

Before this patch, function do_xmote just assumed all the writes
submitted to the journal were finished and successful, and it
called the go_unlock function to release the dlm lock. But if
they're not, and a revoke failed to make its way to the journal,
a journal replay on another node will cause corruption if we
let the go_inval function continue and tell dlm to release the
glock to another node. This patch adds a couple assert_withdraws
in do_xmote after the calls to go_sync and go_inval. The asserts
should cause another node to replay the journal before continuing,
thus protecting rgrp and dinode glocks and maintaining the
integrity of the metadata.

Signed-off-by: Bob Peterson <rpeterso at redhat.com>
---
 fs/gfs2/glock.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4996ab06e721..72a7b19c3aef 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -566,8 +566,12 @@ __acquires(&gl->gl_lockref.lock)
 	spin_unlock(&gl->gl_lockref.lock);
 	if (glops->go_sync)
 		glops->go_sync(gl);
+	gfs2_assert_withdraw(sdp, atomic_read(&sdp->sd_log_errors) == 0);
 	if (test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags))
 		glops->go_inval(gl, target == LM_ST_DEFERRED ? 0 : DIO_METADATA);
+
+	if (!gfs2_assert_withdraw(sdp, atomic_read(&sdp->sd_log_errors) == 0))
+		gfs2_assert_withdraw(sdp, !atomic_read(&gl->gl_ail_count));
 	clear_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags);
 
 	gfs2_glock_hold(gl);
-- 
2.20.1