[Cluster-devel] [GFS2 patch] fix hangup when multiple processes are trying to write to the same file
Steven Whitehouse
swhiteho at redhat.com
Fri Feb 23 16:29:19 UTC 2007
Hi,
On Fri, 2007-02-23 at 09:05 -0500, Josef Whiter wrote:
> On Fri, Feb 23, 2007 at 08:38:32AM -0500, Josef Whiter wrote:
> > This fixes a problem I encountered while running bonnie++. When you have one
> > thread that opens a file and starts to write to it, and then another thread that
> > tries to open and write to the same file, the second thread will loop forever
> > trying to grab the inode lock for that inode. Basically we come in through
> > generic_buffered_file_write, which calls gfs2_prepare_write, which then attempts
> > to grab the glock. Because we don't own the lock, gfs2_prepare_write gets
> > GLR_TRYFAILED, which returns AOP_TRUNCATED_PAGE to generic_buffered_file_write.
> > At this point generic_buffered_file_write loops around again and immediately
> > retries the prepare_write. This means that the second process never gets off of
> > the processor in order to allow the process that holds the lock to finish its
> > work and let go of the lock. This patch makes gfs2_glock_nq schedule() if it
> > gets back a GLR_TRYFAILED, which resolves this problem. Please let me know if
> > this is in the wrong place, ie if it should go in gfs2_prepare_write instead of
> > gfs2_glock_nq, or if its completely wrong :).
> >
>
> Ok at the suggestion of Steven I have made it so that we just yield in the
> places where this would be a problem, so in gfs2_prepare_write and
> gfs2_readpage.
>
> Signed-off-by: Josef Whiter <jwhiter at redhat.com>
>
Thats looks good, but one small comment:
> --- linux-2.6/fs/gfs2/ops_address.c.josef 2007-02-23 09:51:40.000000000 -0500
> +++ linux-2.6/fs/gfs2/ops_address.c 2007-02-23 09:54:27.000000000 -0500
> @@ -266,8 +266,10 @@ skip_lock:
> out:
> return error;
> out_unlock:
> - if (error == GLR_TRYFAILED)
> + if (error == GLR_TRYFAILED) {
> error = AOP_TRUNCATED_PAGE;
> + yield();
> + }
> unlock_page(page);
Can you move the above unlock_page to directly after the out_unlock: ?
That way we won't schedule while holding the page lock. The other part
of the patch does that in the right order already and looks good,
Steve.
> if (do_unlock)
> gfs2_holder_uninit(&gh);
> @@ -364,6 +366,7 @@ static int gfs2_prepare_write(struct fil
> if (error == GLR_TRYFAILED) {
> unlock_page(page);
> error = AOP_TRUNCATED_PAGE;
> + yield();
> }
> goto out_uninit;
> }
>
More information about the Cluster-devel
mailing list