[Cluster-devel] [PATCH] gfs2: Don't free rgrp clone bitmaps until go_inval

Bob Peterson rpeterso at redhat.com
Wed May 10 19:08:18 UTC 2023


Before this patch, every time an rgrp was synced (go_sync) the
clone bitmaps were freed. We do not need to free the bitmaps in many
common cases. For example when demoting the glock from EXCLUSIVE to
SHARED. This is especially wasteful in cases where we unlink lots of
files: the rgrps are transitioned to EX, then back to SH multiple
times as it looks at the dinode allocation states, then frees them,
but the clones prevent allocations until the files are evicted.
Subsequent uses often cause the rgrp glock to be transitioned from
SH to EX and back again in rapid succession.

In these cases it's proper to sync the rgrp bitmaps to the storage media
but wasteful to free the clones, because the very next unlink needs to
reallocate the clone bitmaps again. So in short, today we have:

1. SH->EX (for unlink or other)
2. Allocate (kmalloc) a clone bitmap.
3. Clear the bits in original bitmap.
4. Keep original state in the clone bitmap to prevent re-allocation
   until the last user closes the file.
5. EX->SH
6. Sync bitmap to storage media.
7. Free the clone bitmaps.
8. Go to 1.

This repeated kmalloc -> kfree -> kmalloc -> kfree is a waste of time:
We only need to free the clone bitmaps when the glock is invalidated
(i.e. when transitioning the glock to UN or DF so another node's view
is consistent.) However, we still need to re-sync the clones with the
real bitmap. This patch allows rgrp bitmaps to stick around until we
have an invalidate of the glock. So in short:

1. SH->EX (for unlink or other)
2. Only the first time, allocate (kmalloc) a clone bitmap.
3. Free the bits in original bitmap.
4. Keep original state in the clone bitmap to prevent re-allocation
   until the last user closes the file.
5. EX->SH
6. Sync bitmap to storage media.
7. Go to 1.

Other transitions, like EX->UN still sync and free the clone bitmaps.
And, of course, transition from SH->EX cannot have dirty buffers, so
will not have clone bitmaps.

Signed-off-by: Bob Peterson <rpeterso at redhat.com>
---
 fs/gfs2/glops.c |  4 +++-
 fs/gfs2/rgrp.c  | 13 +++++++++++++
 fs/gfs2/rgrp.h  |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 01d433ed6ce7..58cf2004548e 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -205,7 +205,8 @@ static int rgrp_go_sync(struct gfs2_glock *gl)
 	error = gfs2_rgrp_metasync(gl);
 	if (!error)
 		error = gfs2_ail_empty_gl(gl);
-	gfs2_free_clones(rgd);
+	if (!test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags))
+		gfs2_sync_clones(rgd);
 	return error;
 }
 
@@ -229,6 +230,7 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
 
 	if (!rgd)
 		return;
+	gfs2_free_clones(rgd);
 	start = (rgd->rd_addr * bsize) & PAGE_MASK;
 	end = PAGE_ALIGN((rgd->rd_addr + rgd->rd_length) * bsize) - 1;
 	gfs2_rgrp_brelse(rgd);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 3b9b76e980ad..6e212e0eb74e 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -616,6 +616,19 @@ void gfs2_free_clones(struct gfs2_rgrpd *rgd)
 	}
 }
 
+void gfs2_sync_clones(struct gfs2_rgrpd *rgd)
+{
+	int x;
+
+	for (x = 0; x < rgd->rd_length; x++) {
+		struct gfs2_bitmap *bi = rgd->rd_bits + x;
+		if (bi->bi_clone)
+			memcpy(bi->bi_clone + bi->bi_offset,
+			       bi->bi_bh->b_data + bi->bi_offset,
+			       bi->bi_bytes);
+	}
+}
+
 static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs,
 		    const char *fs_id_buf)
 {
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 00b30cf893af..254188cf2d7b 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -32,6 +32,7 @@ extern void gfs2_clear_rgrpd(struct gfs2_sbd *sdp);
 extern int gfs2_rindex_update(struct gfs2_sbd *sdp);
 extern void gfs2_free_clones(struct gfs2_rgrpd *rgd);
 extern int gfs2_rgrp_go_instantiate(struct gfs2_glock *gl);
+extern void gfs2_sync_clones(struct gfs2_rgrpd *rgd);
 extern void gfs2_rgrp_brelse(struct gfs2_rgrpd *rgd);
 
 extern struct gfs2_alloc *gfs2_alloc_get(struct gfs2_inode *ip);
-- 
2.40.1



More information about the Cluster-devel mailing list