[Cluster-devel] [GFS2 PATCH] gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS
Bob Peterson
rpeterso at redhat.com
Fri Nov 15 17:39:41 UTC 2019
----- Original Message -----
> On Fri, Nov 15, 2019 at 4:45 PM Bob Peterson <rpeterso at redhat.com> wrote:
> >
> > Hi,
> >
> > This patch closes a timing window in which two processes compete
> > and overlap in the execution of do_xmote for the same glock:
> >
> > Process A Process B
> > ------------------------------------ -----------------------------
> > 1. Grabs gl_lockref and calls do_xmote
> > 2. Grabs gl_lockref but is blocked
> > 3. Sets GLF_INVALIDATE_IN_PROGRESS
> > 4. Unlocks gl_lockref
> > 5. Calls do_xmote
> > 6. Call glops->go_sync
> > 7. test_and_clear_bit GLF_DIRTY
> > 8. Call gfs2_log_flush Call glops->go_sync
> > 9. (slow IO, so it blocks a long time) test_and_clear_bit GLF_DIRTY
> > It's not dirty (step 7) returns
> > 10. Tests GLF_INVALIDATE_IN_PROGRESS
> > 11. Calls go_inval (rgrp_go_inval)
> > 12. gfs2_rgrp_relse does brelse
> > 13. truncate_inode_pages_range
> > 14. Calls lm_lock UN
> >
> > In step 14 we've just told dlm to give the glock to another node
> > when, in fact, process A has not finished the IO and synced all
> > buffer_heads to disk and make sure their revokes are done.
> >
> > This patch fixes the problem by changing the GLF_INVALIDATE_IN_PROGRESS
> > to use test_and_set_bit, and if the bit is already set, process B just
> > ignores it and trusts that process A will do the do_xmote in the proper
> > order.
> >
> > Signed-off-by: Bob Peterson <rpeterso at redhat.com>
> > ---
> > fs/gfs2/glock.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> > index faa88bd594e2..4a4a390ffd00 100644
> > --- a/fs/gfs2/glock.c
> > +++ b/fs/gfs2/glock.c
> > @@ -558,7 +558,19 @@ __acquires(&gl->gl_lockref.lock)
> > GLOCK_BUG_ON(gl, gl->gl_state == gl->gl_target);
> > if ((target == LM_ST_UNLOCKED || target == LM_ST_DEFERRED) &&
> > glops->go_inval) {
> > - set_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags);
> > + /*
> > + * If another process is already doing the invalidate we
> > should
> > + * not interfere. If we call go_sync and it finds the glock
> > is
> > + * not dirty, we might call go_inval prematurely before the
> > + * other go_sync has finished with its revokes. If we then
> > call
> > + * lm_lock prematurely, we've really screwed up: we cannot
> > tell
> > + * dlm to give the glock away until we're synced and
> > + * invalidated. Best thing is to return and trust the other
> > + * process will finish do_xmote tasks in their proper
> > order.
> > + */
>
> That's a bit too much information here. Can we please change that as follows?
>
> * If another process is already doing the invalidate, let
> that
> * finish first. The glock state machine will get back to
> this
> * holder again later.
>
> > + if (test_and_set_bit(GLF_INVALIDATE_IN_PROGRESS,
> > + &gl->gl_flags))
> > + return;
> > do_error(gl, 0); /* Fail queued try locks */
> > }
> > gl->gl_req = target;
> >
>
> Thanks,
> Andreas
>
>
Sure. Make it so.
Bob
More information about the Cluster-devel
mailing list