[Cluster-devel] [GFS2 PATCH] gfs2: Always update the in-core rgrp when the buffer_head is read

Thu Aug 16 09:16:04 UTC 2018

Hi,

On 15/08/18 18:04, Bob Peterson wrote:
> Hi,
>
> Before this patch, function gfs2_rgrp_bh_get would only update the
> rgrp info from the buffer if the GFS2_RDF_UPTODATE flag was clear.
> This went back to the days when gfs2 kept rgrp version numbers, and
> re-read the buffer_heads constantly, not just when needed.
>
> The problem is, RDF_UPTODATE is a local flag, but lvbs are changed
> dynamically throughout the cluster. This is a serious problem when
> using the rgrplvb mount option because of scenarios like this:
>
> 1. Node A mounts the file system, sets RDF_UPTODATE for rgrp X.
> 2. Node B mounts the file system, sets RDF_UPTODATE for rgrp X.
> 3. Node A deletes a large file, freeing up lots of blocks,
>     so the lvb gets updated.
At this point Node B must have invalidated it's copy of the rgrp, since 
Node A must have an exclusive lock. That should have cleared the 
GFS2_RDF_UPTODATE flag. So why did that not happen?
> 4. Node B now re-reads the rgrp, but because it's marked UPTODATE,
>     it decides not to update its in-core copy of rgrp X.
Why is it marked up to date when rgrp_go_inval() should have cleared it 
when the rgrp lock was demoted to allow Node A it's exclusive lock?

>
> At this point, Node B will have the wrong value for rgd->rd_free,
> the amount of free space in the rgrp.
>
> But there's no good reason not to grab the most recent values from
> the buffer: it only costs us a few cpu cycles to read them.
>
> This patch removes the UPTODATE check in favor of just always
> reading the rgrp values in from the buffer we just read.
>
> Signed-off-by: Bob Peterson <rpeterso at redhat.com>
> ---
>   fs/gfs2/rgrp.c | 17 ++++++++---------
>   1 file changed, 8 insertions(+), 9 deletions(-)
I think we should understand why the flag is not set correctly. If it is 
not working correctly for this case, then why can we still trust it for 
the other check in update_rgrp_lvb() ?

Steve.

> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index 1ad3256b9cbc..6eec634eae2d 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -1177,15 +1177,14 @@ static int gfs2_rgrp_bh_get(struct gfs2_rgrpd *rgd)
>   		}
>   	}
>   
> -	if (!(rgd->rd_flags & GFS2_RDF_UPTODATE)) {
> -		for (x = 0; x < length; x++)
> -			clear_bit(GBF_FULL, &rgd->rd_bits[x].bi_flags);
> -		gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data);
> -		rgd->rd_flags |= (GFS2_RDF_UPTODATE | GFS2_RDF_CHECK);
> -		rgd->rd_free_clone = rgd->rd_free;
> -		/* max out the rgrp allocation failure point */
> -		rgd->rd_extfail_pt = rgd->rd_free;
> -	}
> +	for (x = 0; x < length; x++)
> +		clear_bit(GBF_FULL, &rgd->rd_bits[x].bi_flags);
> +	gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data);
> +	rgd->rd_flags |= (GFS2_RDF_UPTODATE | GFS2_RDF_CHECK);
> +	rgd->rd_free_clone = rgd->rd_free;
> +	/* max out the rgrp allocation failure point */
> +	rgd->rd_extfail_pt = rgd->rd_free;
> +
>   	if (cpu_to_be32(GFS2_MAGIC) != rgd->rd_rgl->rl_magic) {
>   		rgd->rd_rgl->rl_unlinked = cpu_to_be32(count_unlinked(rgd));
>   		gfs2_rgrp_ondisk2lvb(rgd->rd_rgl,
>