[Cluster-devel] [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

Andreas Gruenbacher agruenba at redhat.com
Thu May 25 15:02:09 UTC 2023


On Wed, May 24, 2023 at 6:02 PM Alexander Aring <aahringo at redhat.com> wrote:
> This patch fixes a possible plock op collisions when using F_SETLKW lock
> requests and fsid, number and owner are not enough to identify a result
> for a pending request. The ltp testcases [0] and [1] are examples when
> this is not enough in case of using classic posix locks with threads and
> open filedescriptor posix locks.
>
> The idea to fix the issue here is to place all lock request in order. In
> case of non F_SETLKW lock request (indicated if wait is set or not) the
> lock requests are ordered inside the recv_list. If a result comes back
> the right plock op can be found by the first plock_op in recv_list which
> has not info.wait set. This can be done only by non F_SETLKW plock ops as
> dlm_controld always reads a specific plock op (list_move_tail() from
> send_list to recv_mlist) and write the result immediately back.
>
> This behaviour is for F_SETLKW not possible as multiple waiters can be
> get a result back in an random order. To avoid a collisions in cases
> like [0] or [1] this patch adds more fields to compare the plock
> operations as the lock request is the same. This is also being made in
> NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We
> still can't find the exact lock request for a specific result if the
> lock request is the same, but if this is the case we don't care the
> order how the identical lock requests get their result back to grant the
> lock.

When the recv_list contains multiple indistinguishable requests, this
can only be because they originated from multiple threads of the same
process. In that case, I agree that it doesn't matter which of those
requests we "complete" in dev_write() as long as we only complete one
request. We do need to compare the additional request fields in
dev_write() to find a suitable request, so that makes sense as well.
We need to compare all of the fields that identify a request (optype,
ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
"right" request (or in case there is more than one identical request,
a "suitable" request).

The above patch description doesn't match the code anymore, and the
code doesn't fully revert the recv_list splitting of the previous
version.

> [0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c
> [1] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h=v6.4-rc1#n373
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc1#n731
>
> Cc: stable at vger.kernel.org
> Signed-off-by: Alexander Aring <aahringo at redhat.com>
> ---
> change since v2:
>  - don't split recv_list into recv_setlkw_list
>
>  fs/dlm/plock.c | 43 ++++++++++++++++++++++++++++++-------------
>  1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> index 31bc601ee3d8..53d17dbbb716 100644
> --- a/fs/dlm/plock.c
> +++ b/fs/dlm/plock.c
> @@ -391,7 +391,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count,
>                 if (op->info.flags & DLM_PLOCK_FL_CLOSE)
>                         list_del(&op->list);
>                 else
> -                       list_move(&op->list, &recv_list);
> +                       list_move_tail(&op->list, &recv_list);

^ This should be obsolete, but it won't hurt, either.

>                 memcpy(&info, &op->info, sizeof(info));
>         }
>         spin_unlock(&ops_lock);
> @@ -430,19 +430,36 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
>                 return -EINVAL;
>
>         spin_lock(&ops_lock);
> -       list_for_each_entry(iter, &recv_list, list) {
> -               if (iter->info.fsid == info.fsid &&
> -                   iter->info.number == info.number &&
> -                   iter->info.owner == info.owner) {
> -                       list_del_init(&iter->list);
> -                       memcpy(&iter->info, &info, sizeof(info));
> -                       if (iter->data)
> -                               do_callback = 1;
> -                       else
> -                               iter->done = 1;
> -                       op = iter;
> -                       break;
> +       if (info.wait) {

We should be able to use the same list_for_each_entry() loop for
F_SETLKW requests (which have info.wait set) as for all other requests
as far as I can see.

> +               list_for_each_entry(iter, &recv_list, list) {
> +                       if (iter->info.fsid == info.fsid &&
> +                           iter->info.number == info.number &&
> +                           iter->info.owner == info.owner &&
> +                           iter->info.pid == info.pid &&
> +                           iter->info.start == info.start &&
> +                           iter->info.end == info.end &&
> +                           iter->info.ex == info.ex &&
> +                           iter->info.wait) {
> +                               op = iter;
> +                               break;
> +                       }
>                 }
> +       } else {
> +               list_for_each_entry(iter, &recv_list, list) {
> +                       if (!iter->info.wait) {
> +                               op = iter;
> +                               break;
> +                       }
> +               }
> +       }
> +
> +       if (op) {
> +               list_del_init(&op->list);
> +               memcpy(&op->info, &info, sizeof(info));
> +               if (op->data)
> +                       do_callback = 1;
> +               else
> +                       op->done = 1;
>         }

Can't this code just remain in the list_for_each_entry() loop?

>         spin_unlock(&ops_lock);
>
> --
> 2.31.1
>

Andreas



More information about the Cluster-devel mailing list