[Cluster-devel] [PATCH] gfs2: conversion deadlock do_promote bypass

Bob Peterson rpeterso at redhat.com
Wed Jul 26 17:01:08 UTC 2023


I know the description is vague or hard to grasp, but it's hard to be
succinct for this problem.

In this case the failing scenario is this:
1. We have a glock in the SH state
2. A process requests an asychronous lock of the glock in EX mode.
   (rename)
3. Before the lock is granted, more processes (read / ls) request the
   glock in SH again.
4. gfs2 sends a request to DLM for the lock in EX because that holder is
   at the top of the queue.
5. Somehow the dlm request gets canceled, so dlm sends us back a
   response with state==SH and LM_OUT_CANCELED.
6. finish_xmote gets called to process the response from dlm. It detects
   the glock is not in the requested mode, and demote is not in progress
   so it goes through this codepath:
	if (unlikely(state != gl->gl_target)) {
		if (gh && !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags)) {
			if (ret & LM_OUT_CANCELED) {
   At this point, finish_xmote cannot grant the canceled EX holder, but
   the glock is still in SH mode.
7. Before this patch, finish_xmote moves the holder to the end of the
   queue and finds the next holder, which is for SH. Then it does:
				gl->gl_target = gh->gh_state;
				goto retry;
The retry calls do_xmote, which detects the requested state (SH) is equal
to the current state, and does:
	GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target);
To do_xmote, it is invalid to transition a glock to the existing state.

This patch adds a check for the next holder wanting the state the glock
is already in, and if that's the case, it doesn't need to call do_xmote
at all. It can simply call do_promote and promote the holders needing
the glock in the existing state. The patch adds a goto promote which
jumps to the end of finish_xmote where it calls do_promote.

Signed-off-by: Bob Peterson <rpeterso at redhat.com>
---
 fs/gfs2/glock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 1438e7465e30..d1e1fd786417 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -595,6 +595,8 @@ static void finish_xmote(struct gfs2_glock *gl, unsigned int ret)
 					list_move_tail(&gh->gh_list, &gl->gl_holders);
 				gh = find_first_waiter(gl);
 				gl->gl_target = gh->gh_state;
+				if (gl->gl_state == gl->gl_target)
+					goto promote;
 				goto retry;
 			}
 			/* Some error or failed "try lock" - report it */
@@ -640,6 +642,7 @@ static void finish_xmote(struct gfs2_glock *gl, unsigned int ret)
 				goto out;
 			}
 		}
+promote:
 		do_promote(gl);
 	}
 out:
-- 
2.41.0



More information about the Cluster-devel mailing list