[Cluster-devel] [PATCHv4 v6.5-rc2 3/3] fs: dlm: fix F_CANCELLK to cancel pending request

Mon Jul 24 20:42:07 UTC 2023

Hi,

On Mon, Jul 24, 2023 at 3:03 PM Alexander Aring <aahringo at redhat.com> wrote:
>
> Hi,
>
> On Mon, Jul 24, 2023 at 10:40 AM Andreas Gruenbacher
> <agruenba at redhat.com> wrote:
> >
> > On Fri, Jul 21, 2023 at 8:55 PM Alexander Aring <aahringo at redhat.com> wrote:
> > > Hi,
> > >
> > > On Fri, Jul 21, 2023 at 12:25 PM Andreas Gruenbacher
> > > <agruenba at redhat.com> wrote:
> > > >
> > > > On Thu, Jul 20, 2023 at 2:22 PM Alexander Aring <aahringo at redhat.com> wrote:
> > > > > This patch fixes the current handling of F_CANCELLK by not just doing a
> > > > > unlock as we need to try to cancel a lock at first. A unlock makes sense
> > > > > on a non-blocking lock request but if it's a blocking lock request we
> > > > > need to cancel the request until it's not granted yet. This patch is fixing
> > > > > this behaviour by first try to cancel a lock request and if it's failed
> > > > > it's unlocking the lock which seems to be granted.
> > > >
> > > > I don't like how this is implemented, but the problem already starts
> > > > in commit 57e2c2f2d94c ("fs: dlm: fix mismatch of plock results from
> > > > userspace"). That commit relies on how dlm_controld is implemented
> > > > internally; it requires dlm_controld to keep all replies to
> > > > non-blocking requests in perfect order. The correctness of that
> > > > approach is more difficult to argue for than it should be, the code
> > > > might break in non-obvious ways, and it limits the kinds of things
> > > > dlm_controld will be able to do in the future. And this change relies
> > > > on the same flawed design.
> > > >
> > >
> > > I agree it is not the best design and I mentioned it in my previous
> > > patch series versions [0]:
> > >
> > > TODO we should move to a instance reference from request and reply and
> > > not go over lock states, but it seems going over lock states and get
> > > the instance does not make any problems (at least there were no issues
> > > found yet) but it's much cleaner to not think about all possible
> > > special cases and states to instance has no 1:1 mapping anymore.
> >
> > I guess by /go over lock states/, you mean what dev_write() does --
> > compare the request and response fields to match requests with
> > responses as well as possible, based on the information that's in
> > struct dlm_plock_info today.
> >
>
> yes, it's what we doing when wait is true.
>
> > > > Instead, when noticing that we don't have a way to uniquely identify
> > > > requests, such a way should just have been introduced. The CANCEL
> > > > request would then carry the same unique identifier as the original
> > > > locking request, dlm_controld would reply either to the original
> > > > locking request or to the cancel request, and the kernel would have no
> > > > problem matching the reply to the request.
> > > >
> > > > But without unique request identifiers, we now end up with those
> > > > redundant -ENOENT replies to CANCEL requests and with those
> > > > essentially duplicate entries for the same request on recv_list. Not
> > > > great.
> > > >
> > >
> > > There is no reference of lock request instances between kernel and
> > > user yet. It does it by having a match if it's the same lock request.
> > > The whole mechanism works like this, but one reason why we recently
> > > had problems with it. A lock request is the same in the sense that
> > > they are being granted at the same time. So far we did not experience
> > > any problems with that... but as mentioned in [0] you need to think
> > > about all potential cases.
> >
> > I have no idea what you're talking about.
> >
>
> I mean that currently there are no issues I am aware of and there
> exists no reference between lock request in the kernel and lock
> request in the user. This all goes over the lock states, if they are
> identical as "they are getting granted at the same time".
>
> > Let me point out one thing though: when dlm_controld replies to
> > multiple pending locking requests "in one go", without having to wait
> > on any unlocks to happen, this doesn't mean that those requests are
> > all "the same" at all. The requests may be for the same "actor" on the
> > kernel side, but they don't have to be. As such, it does make sense to
> > treat each of those requests independently.
> >
>
> Now, I have no idea what you are talking about.
>
> > > NOTE: that even F_CANCELLK does not give you a unique reference of the
> > > original locking request, it works over matching the field of struct
> > > file_lock... There is already the same problem. The struct file_lock
> > > pointer could be used, but struct file_lock does not represent a lock
> > > request, this does not exist in VFS posix lock API.
> >
> > It seems to me that struct file_lock objects persist across the entire
> > locking request, both for the synchronous locking requests of gfs2 and
> > for the asynchronous locking requests of lockd. In other words, when
> > lockd cancels a locking request, it seems to use the same struct
> > file_lock object it used for making the original request (see where
> > lockd calls vfs_cancel_lock()). This means that the address of that
> > object would be a suitable locking request identifier. And while we
> > don't have a huge amount of space left in struct dlm_plock_info, we do
> > have two 32-bit fields in the version[] array that we could use for
> > communicating that identifier. It would of course be better if we
> > didn't need that much space, but I don't see a realistic alternative
> > if we want to cleanly fix this whole mess.
>
> Please tell me the API describing that you can use struct file_lock
> pointer to lookup the pending lock. This is implementation specific,
> the next person hacking on lockd does not know that.
> I will try to come up with something. Because 32 bits are limited we
> could run into collisions, but it should be enough for normal usage.
>

A draft can be found under [0].

> But I will not use the file_lock * as lookup, it will work over fields
> which end in the same "mess" behaviour. There could be follow up
> patches to fix the VFS layer to use "lock request" unique identifiers
> but VFS does not have any storage type to keep track of pending lock
> requests so far I know.
>

This problem requires actually 2 ids, one for the plock cancel
operation request match itself. And another one for the plock request
which should be cancelled. I've only implemented the plock cancel
operation request match. The other part is how I see F_CANCELLK is
supposed to work as "match on struct file_lock fields". I am open to
changing this behaviour but it requires more work on VFS.

- Alex

[0] https://gitlab.com/netcoder/linux/-/commit/aa67d0ec656a0d060fe3d0c0d86264dea1b2cc7d