[Cluster-devel] [PATCH] gfs2: fix lock cancelling

Thu Sep 20 15:48:23 UTC 2007

On Thu, Sep 20, 2007 at 10:31:54AM -0500, David Teigland wrote:
> If found on the recv_list, it means the op has been sent up to the lock
> manager in userspace and is still floating around up there.  If we remove
> the op from the recv_list, it means, as you say, that the lock manager
> could get an error back later when it does dev_write() to complete the op.
> (dev_write() just prints an error message currently, doesn't return an
> error to userspace.)
> 
> This assumes, of course, that seeing an error, the lock manager could do
> something sensible to bring itself back in sync with the application... as
> we've discussed before, that's a hard problem that we may never solve :-)

It's a hard problem, but it'll need to be solved some day.  And it can't
be solved as long as the kernel isn't even giving userspace the
information it would need to solve the problem.

For now, could you just generate an unlock request in the case where you
get an error on the write?  That's certainly not perfect, but it's no
worse than the current behavior.

> > +	list_for_each_entry(op, &send_list, list) {
> > +		xop = (struct plock_xop *xop)op;
> > +		if (!xop->callback)
> > +			continue;
> > +		if (xop->fl != fl)
> > +			continue;
> > +		list_del_init(&op->list);
> > +		goto found;
> > +	}
> 
> If found on the send_list, it means the op hasn't been sent up to the lock
> manager yet, so the cancel can be considered a success.
> 
> > +	spin_unlock(&ops_lock);
> > +	/* Too late; the lock's probably already been granted. */
> > +	return -ENOENT;
> 
> It's up to the caller to sort out what happens in this case.

Yup.

--b.