[Cluster-devel] [RFC v6.5-rc2 2/3] fs: lockd: fix race in async lock request handling
Alexander Aring
aahringo at redhat.com
Thu Aug 10 20:37:47 UTC 2023
Hi,
On Fri, Jul 21, 2023 at 11:45 AM Jeff Layton <jlayton at kernel.org> wrote:
>
> On Thu, 2023-07-20 at 08:58 -0400, Alexander Aring wrote:
> > This patch fixes a race in async lock request handling between adding
> > the relevant struct nlm_block to nlm_blocked list after the request was
> > sent by vfs_lock_file() and nlmsvc_grant_deferred() does a lookup of the
> > nlm_block in the nlm_blocked list. It could be that the async request is
> > completed before the nlm_block was added to the list. This would end
> > in a -ENOENT and a kernel log message of "lockd: grant for unknown
> > block".
> >
> > To solve this issue we add the nlm_block before the vfs_lock_file() call
> > to be sure it has been added when a possible nlmsvc_grant_deferred() is
> > called. If the vfs_lock_file() results in an case when it wouldn't be
> > added to nlm_blocked list, the nlm_block struct will be removed from
> > this list again.
> >
> > Signed-off-by: Alexander Aring <aahringo at redhat.com>
> > ---
> > fs/lockd/svclock.c | 80 +++++++++++++++++++++++++++----------
> > include/linux/lockd/lockd.h | 1 +
> > 2 files changed, 60 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > index 28abec5c451d..62ef27a69a9e 100644
> > --- a/fs/lockd/svclock.c
> > +++ b/fs/lockd/svclock.c
> > @@ -297,6 +297,8 @@ static void nlmsvc_free_block(struct kref *kref)
> >
> > dprintk("lockd: freeing block %p...\n", block);
> >
> > + WARN_ON_ONCE(block->b_flags & B_PENDING_CALLBACK);
> > +
> > /* Remove block from file's list of blocks */
> > list_del_init(&block->b_flist);
> > mutex_unlock(&file->f_mutex);
> > @@ -543,6 +545,12 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
> > goto out;
> > }
> >
> > + if (block->b_flags & B_PENDING_CALLBACK)
> > + goto pending_request;
> > +
> > + /* Append to list of blocked */
> > + nlmsvc_insert_block(block, NLM_NEVER);
> > +
> > if (!wait)
> > lock->fl.fl_flags &= ~FL_SLEEP;
> > mode = lock_to_openmode(&lock->fl);
> > @@ -552,9 +560,13 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
> > dprintk("lockd: vfs_lock_file returned %d\n", error);
> > switch (error) {
> > case 0:
> > + nlmsvc_remove_block(block);
> > ret = nlm_granted;
> > goto out;
> > case -EAGAIN:
> > + if (!wait)
> > + nlmsvc_remove_block(block);
> > +pending_request:
> > /*
> > * If this is a blocking request for an
> > * already pending lock request then we need
> > @@ -565,6 +577,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
> > ret = async_block ? nlm_lck_blocked : nlm_lck_denied;
> > goto out;
> > case FILE_LOCK_DEFERRED:
> > + block->b_flags |= B_PENDING_CALLBACK;
> > +
> > if (wait)
> > break;
> > /* Filesystem lock operation is in progress
> > @@ -572,17 +586,16 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
> > ret = nlmsvc_defer_lock_rqst(rqstp, block);
>
> When the above function is called, it's going to end up reinserting the
> block into the list. I think you probably also need to remove the call
> to nlmsvc_insert_block from nlmsvc_defer_lock_rqst since it could have
> been granted before that occurs.
>
it cannot be granted during this time because the f_mutex is held. We
insert it in the first place to have a way to get the block lookup
working when a lm_grant() is really fast. Then lm_grant() will lookup
the lock and have a way to get f_mutex to hold it. Then lm_grant()
will only run when nobody is in this critical area (on a per nlm_file
basis).
There is a difference in the call between NLM_NEVER and NLM_TIMEOUT in
nlmsvc_defer_lock_rqst(), when nlmsvc_defer_lock_rqst() it will just
update the timeout value. I am not sure about the consequences when it
does a nlmsvc_insert_block() with NLM_NEVER instead of NLM_TIMEOUT.
But as I said it should not be possible to grant the block when
f_mutex is held.
> > goto out;
> > case -EDEADLK:
> > + nlmsvc_remove_block(block);
> > ret = nlm_deadlock;
> > goto out;
> > default: /* includes ENOLCK */
> > + nlmsvc_remove_block(block);
> > ret = nlm_lck_denied_nolocks;
> > goto out;
> > }
> >
> > ret = nlm_lck_blocked;
> > -
> > - /* Append to list of blocked */
> > - nlmsvc_insert_block(block, NLM_NEVER);
> > out:
> > mutex_unlock(&file->f_mutex);
> > nlmsvc_release_block(block);
> > @@ -739,34 +752,59 @@ nlmsvc_update_deferred_block(struct nlm_block *block, int result)
> > block->b_flags |= B_TIMED_OUT;
> > }
> >
> > +static int __nlmsvc_grant_deferred(struct nlm_block *block,
> > + struct file_lock *fl,
> > + int result)
> > +{
> > + int rc = 0;
> > +
> > + dprintk("lockd: nlmsvc_notify_blocked block %p flags %d\n",
> > + block, block->b_flags);
> > + if (block->b_flags & B_QUEUED) {
> > + if (block->b_flags & B_TIMED_OUT) {
> > + rc = -ENOLCK;
> > + goto out;
> > + }
> > + nlmsvc_update_deferred_block(block, result);
> > + } else if (result == 0)
> > + block->b_granted = 1;
> > +
> > + nlmsvc_insert_block_locked(block, 0);
> > + svc_wake_up(block->b_daemon);
> > +out:
> > + return rc;
> > +}
> > +
> > static int nlmsvc_grant_deferred(struct file_lock *fl, int result)
> > {
> > - struct nlm_block *block;
> > - int rc = -ENOENT;
> > + struct nlm_block *block = NULL;
> > + int rc;
> >
> > spin_lock(&nlm_blocked_lock);
> > list_for_each_entry(block, &nlm_blocked, b_list) {
> > if (nlm_compare_locks(&block->b_call->a_args.lock.fl, fl)) {
> > - dprintk("lockd: nlmsvc_notify_blocked block %p flags %d\n",
> > - block, block->b_flags);
> > - if (block->b_flags & B_QUEUED) {
> > - if (block->b_flags & B_TIMED_OUT) {
> > - rc = -ENOLCK;
> > - break;
> > - }
> > - nlmsvc_update_deferred_block(block, result);
> > - } else if (result == 0)
> > - block->b_granted = 1;
> > -
> > - nlmsvc_insert_block_locked(block, 0);
> > - svc_wake_up(block->b_daemon);
> > - rc = 0;
> > + kref_get(&block->b_count);
> > break;
> > }
> > }
> > spin_unlock(&nlm_blocked_lock);
> > - if (rc == -ENOENT)
> > - printk(KERN_WARNING "lockd: grant for unknown block\n");
> > +
> > + if (!block) {
> > + pr_warn("lockd: grant for unknown pending block\n");
> > + return -ENOENT;
> > + }
> > +
> > + /* don't interfere with nlmsvc_lock() */
> > + mutex_lock(&block->b_file->f_mutex);
>
>
> This is called from lm_grant, and Documentation/filesystems/locking.rst
> says that lm_grant is not allowed to block. The only caller though is
> dlm_plock_callback, and I don't see anything that would prevent
> blocking.
>
> Do we need to fix the documentation there?
>
You are right and I think it should not call any sleepable API.
However DLM is the only one upstream user and I have no other idea how
to make the current situation better.
We should update the documentation but be open to make it
non-sleepable in future?
>
> > + block->b_flags &= ~B_PENDING_CALLBACK;
> > +
>
> You're adding this new flag when the lock is deferred and then clearing
> it when the lock is granted. What about when the lock request is
> cancelled (e.g. by signal)? It seems like you also need to clear it then
> too, correct?
>
correct. I add code to clear it when the block is getting removed from
nlm_blocked in nlmsvc_remove_block().
> > + spin_lock(&nlm_blocked_lock);
> > + WARN_ON_ONCE(list_empty(&block->b_list));
> > + rc = __nlmsvc_grant_deferred(block, fl, result);
> > + spin_unlock(&nlm_blocked_lock);
> > + mutex_unlock(&block->b_file->f_mutex);
> > +
> > + nlmsvc_release_block(block);
> > return rc;
> > }
> >
> > diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
> > index f42594a9efe0..a977be8bcc2c 100644
> > --- a/include/linux/lockd/lockd.h
> > +++ b/include/linux/lockd/lockd.h
> > @@ -189,6 +189,7 @@ struct nlm_block {
> > #define B_QUEUED 1 /* lock queued */
> > #define B_GOT_CALLBACK 2 /* got lock or conflicting lock */
> > #define B_TIMED_OUT 4 /* filesystem too slow to respond */
> > +#define B_PENDING_CALLBACK 8 /* pending callback for lock request */
> > };
> >
> > /*
>
> --
> Jeff Layton <jlayton at kernel.org>
>
More information about the Cluster-devel
mailing list