[Cluster-devel] [GFS2 PATCH v2] gfs2: fast dealloc for exhash directories

Tue Apr 6 15:12:00 UTC 2021

On Mon, Mar 22, 2021 at 3:15 PM Bob Peterson <rpeterso at redhat.com> wrote:
> Before this patch, whenever a directory was deleted, it called function
> __gfs2_dir_exhash_dealloc to deallocate the directory's leaf blocks.
> But __gfs2_dir_exhash_dealloc never knew if any given leaf block had
> leaf continuation aka "next" blocks, so it read every single leaf block
> in, only to determine in 99% of the cases that there was none. Still,
> this reading in of all the leaf blocks was very slow.
>
> This patch adds a new disk flag that indicates whether a directory is
> clean of any "next leaf" blocks. If so, it takes an optimized path that
> just deletes the leaf blocks and zeroes out the hash table.i_depth

The algorithm description in dir.c suggests that lf_next cannot be set
as long as i_depth < GFS2_DIR_MAX_DEPTH. I didn't see where that is
being checked in the code, but I may have missed it. If that check is
indeed missing, adding it would save a lot of time in most cases. That
should be paired with asserts that prevent lf_next from being set
unless i_depth == GFS2_DIR_MAX_DEPTH.

Beyond that, this patch adds a single per-inode GFS2_DIF_NO_NEXT_LEAF
flag, so as soon as a single leaf block overflows, we'll end up
reading all leaf blocks anyway. Which means that the patch only helps
performance in a very narrow window. To really make a difference, we'd
need such a flag per index entry, but the index uses physical block
numbers instead of logical block numbers, so we don't have any bits
left there.

> It would seem to make more sense to have the new bit indicate when a
> directory contains "next leaf" blocks rather than the inverse, but we
> need to treat file systems created by older versions of gfs2 as if
> they have "next leaf" blocks.

Andreas