[Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions
Andreas Gruenbacher
agruenba at redhat.com
Thu May 25 15:02:09 UTC 2023
On Wed, May 24, 2023 at 6:02 PM Alexander Aring <aahringo at redhat.com> wrote:
> This patch fixes a possible plock op collisions when using F_SETLKW lock
> requests and fsid, number and owner are not enough to identify a result
> for a pending request. The ltp testcases [0] and [1] are examples when
> this is not enough in case of using classic posix locks with threads and
> open filedescriptor posix locks.
>
> The idea to fix the issue here is to place all lock request in order. In
> case of non F_SETLKW lock request (indicated if wait is set or not) the
> lock requests are ordered inside the recv_list. If a result comes back
> the right plock op can be found by the first plock_op in recv_list which
> has not info.wait set. This can be done only by non F_SETLKW plock ops as
> dlm_controld always reads a specific plock op (list_move_tail() from
> send_list to recv_mlist) and write the result immediately back.
>
> This behaviour is for F_SETLKW not possible as multiple waiters can be
> get a result back in an random order. To avoid a collisions in cases
> like [0] or [1] this patch adds more fields to compare the plock
> operations as the lock request is the same. This is also being made in
> NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We
> still can't find the exact lock request for a specific result if the
> lock request is the same, but if this is the case we don't care the
> order how the identical lock requests get their result back to grant the
> lock.
When the recv_list contains multiple indistinguishable requests, this
can only be because they originated from multiple threads of the same
process. In that case, I agree that it doesn't matter which of those
requests we "complete" in dev_write() as long as we only complete one
request. We do need to compare the additional request fields in
dev_write() to find a suitable request, so that makes sense as well.
We need to compare all of the fields that identify a request (optype,
ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
"right" request (or in case there is more than one identical request,
a "suitable" request).
The above patch description doesn't match the code anymore, and the
code doesn't fully revert the recv_list splitting of the previous
version.
> [0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c
> [1] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h=v6.4-rc1#n373
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc1#n731
>
> Cc: stable at vger.kernel.org
> Signed-off-by: Alexander Aring <aahringo at redhat.com>
> ---
> change since v2:
> - don't split recv_list into recv_setlkw_list
>
> fs/dlm/plock.c | 43 ++++++++++++++++++++++++++++++-------------
> 1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> index 31bc601ee3d8..53d17dbbb716 100644
> --- a/fs/dlm/plock.c
> +++ b/fs/dlm/plock.c
> @@ -391,7 +391,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count,
> if (op->info.flags & DLM_PLOCK_FL_CLOSE)
> list_del(&op->list);
> else
> - list_move(&op->list, &recv_list);
> + list_move_tail(&op->list, &recv_list);
^ This should be obsolete, but it won't hurt, either.
> memcpy(&info, &op->info, sizeof(info));
> }
> spin_unlock(&ops_lock);
> @@ -430,19 +430,36 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
> return -EINVAL;
>
> spin_lock(&ops_lock);
> - list_for_each_entry(iter, &recv_list, list) {
> - if (iter->info.fsid == info.fsid &&
> - iter->info.number == info.number &&
> - iter->info.owner == info.owner) {
> - list_del_init(&iter->list);
> - memcpy(&iter->info, &info, sizeof(info));
> - if (iter->data)
> - do_callback = 1;
> - else
> - iter->done = 1;
> - op = iter;
> - break;
> + if (info.wait) {
We should be able to use the same list_for_each_entry() loop for
F_SETLKW requests (which have info.wait set) as for all other requests
as far as I can see.
> + list_for_each_entry(iter, &recv_list, list) {
> + if (iter->info.fsid == info.fsid &&
> + iter->info.number == info.number &&
> + iter->info.owner == info.owner &&
> + iter->info.pid == info.pid &&
> + iter->info.start == info.start &&
> + iter->info.end == info.end &&
> + iter->info.ex == info.ex &&
> + iter->info.wait) {
> + op = iter;
> + break;
> + }
> }
> + } else {
> + list_for_each_entry(iter, &recv_list, list) {
> + if (!iter->info.wait) {
> + op = iter;
> + break;
> + }
> + }
> + }
> +
> + if (op) {
> + list_del_init(&op->list);
> + memcpy(&op->info, &info, sizeof(info));
> + if (op->data)
> + do_callback = 1;
> + else
> + op->done = 1;
> }
Can't this code just remain in the list_for_each_entry() loop?
> spin_unlock(&ops_lock);
>
> --
> 2.31.1
>
Andreas
More information about the Cluster-devel
mailing list