[Cluster-devel] [PATCH v6.4-rc2 5/5] fs: dlm: avoid F_SETLKW plock op lookup collisions

Alexander Aring aahringo at redhat.com
Fri May 19 15:21:28 UTC 2023


This patch fixes a possible plock op collisions when using F_SETLKW lock
requests and fsid, number and owner are not enough to identify a result
for a pending request. The ltp testcases [0] and [1] are examples when
this is not enough in case of using classic posix locks with threads and
open filedescriptor posix locks.

The idea to fix the issue here is to split recv_list, which contains
plock ops expecting a result from user space, into a F_SETLKW op
recv_setlkw_list and for all other commands recv_list. Due DLM user
space behavior e.g. dlm_controld a request and writing a result back can
only happen in an ordered way. That means a lookup and iterating over
the recv_list is not required. To place the right plock op as the first
entry of recv_list a change to list_move_tail() was made.

This behaviour is for F_SETLKW not possible as multiple waiters can be
get a result back in an random order. To avoid a collisions in cases
like [0] or [1] this patch adds more fields to compare the plock
operations as the lock request is the same. This is also being made in
NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We
still can't find the exact lock request for a specific result if the
lock request is the same, but if this is the case we don't care the
order how the identical lock requests get their result back to grant the
lock.

[0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c
[1] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h=v6.4-rc1#n373
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc1#n731

Cc: stable at vger.kernel.org
Signed-off-by: Alexander Aring <aahringo at redhat.com>
---
 fs/dlm/plock.c | 47 ++++++++++++++++++++++++++++++++++-------------
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index c9e1d5f54194..540a30a342f0 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -17,6 +17,7 @@
 static DEFINE_SPINLOCK(ops_lock);
 static LIST_HEAD(send_list);
 static LIST_HEAD(recv_list);
+static LIST_HEAD(recv_setlkw_list);
 static DECLARE_WAIT_QUEUE_HEAD(send_wq);
 static DECLARE_WAIT_QUEUE_HEAD(recv_wq);
 
@@ -392,10 +393,14 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count,
 	spin_lock(&ops_lock);
 	if (!list_empty(&send_list)) {
 		op = list_first_entry(&send_list, struct plock_op, list);
-		if (op->info.flags & DLM_PLOCK_FL_CLOSE)
+		if (op->info.flags & DLM_PLOCK_FL_CLOSE) {
 			list_del(&op->list);
-		else
-			list_move(&op->list, &recv_list);
+		} else {
+			if (op->info.wait)
+				list_move(&op->list, &recv_setlkw_list);
+			else
+				list_move_tail(&op->list, &recv_list);
+		}
 		memcpy(&info, &op->info, sizeof(info));
 	}
 	spin_unlock(&ops_lock);
@@ -434,18 +439,34 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
 		return -EINVAL;
 
 	spin_lock(&ops_lock);
-	list_for_each_entry(iter, &recv_list, list) {
-		if (iter->info.fsid == info.fsid &&
-		    iter->info.number == info.number &&
-		    iter->info.owner == info.owner) {
-			list_del_init(&iter->list);
-			memcpy(&iter->info, &info, sizeof(info));
-			if (iter->data)
+	if (info.wait) {
+		list_for_each_entry(iter, &recv_setlkw_list, list) {
+			if (iter->info.fsid == info.fsid &&
+			    iter->info.number == info.number &&
+			    iter->info.owner == info.owner &&
+			    iter->info.pid == info.pid &&
+			    iter->info.start == info.start &&
+			    iter->info.end == info.end) {
+				list_del_init(&iter->list);
+				memcpy(&iter->info, &info, sizeof(info));
+				if (iter->data)
+					do_callback = 1;
+				else
+					iter->done = 1;
+				op = iter;
+				break;
+			}
+		}
+	} else {
+		op = list_first_entry_or_null(&recv_list, struct plock_op,
+					      list);
+		if (op) {
+			list_del_init(&op->list);
+			memcpy(&op->info, &info, sizeof(info));
+			if (op->data)
 				do_callback = 1;
 			else
-				iter->done = 1;
-			op = iter;
-			break;
+				op->done = 1;
 		}
 	}
 	spin_unlock(&ops_lock);
-- 
2.31.1



More information about the Cluster-devel mailing list