[Cluster-devel] [GFS2 PATCH 9/9] dlm: recover slot regardless of whether we still have a connection

Bob Peterson rpeterso at redhat.com
Wed Feb 13 15:21:30 UTC 2019


Before this patch dlm would skip the recover_slot phase of recovery
if it still had a valid comm connection to the failed node. However,
gfs2 still needs to perform journal replay, otherwise we run the
risk of journal replay that happens at reboot time overwriting
metadata we've since modified after we release the locks.

Signed-off-by: Bob Peterson <rpeterso at redhat.com>
---
 fs/dlm/member.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/dlm/member.c b/fs/dlm/member.c
index 0bc43b35d2c5..155bd52eb018 100644
--- a/fs/dlm/member.c
+++ b/fs/dlm/member.c
@@ -463,17 +463,12 @@ static void dlm_lsop_recover_slot(struct dlm_ls *ls, struct dlm_member *memb)
 	if (!ls->ls_ops || !ls->ls_ops->recover_slot)
 		return;
 
-	/* if there is no comms connection with this node
-	   or the present comms connection is newer
-	   than the one when this member was added, then
-	   we consider the node to have failed (versus
-	   being removed due to dlm_release_lockspace) */
+	/* Recover the slot regardless of whether we have a valid connection.
+	 * The node may have simply withdrawn, but still needs its journal
+	 * replayed. */
 
 	error = dlm_comm_seq(memb->nodeid, &seq);
 
-	if (!error && seq == memb->comm_seq)
-		return;
-
 	slot.nodeid = memb->nodeid;
 	slot.slot = memb->slot;
 
-- 
2.20.1




More information about the Cluster-devel mailing list