[Cluster-devel] [GFS2 PATCH] GFS2: Eliminate bitmap clones

Bob Peterson rpeterso at redhat.com
Tue Jul 3 13:28:46 UTC 2018


Hi Steve,

----- Original Message -----
> > Do we really still need "clone bitmaps" in gfs2? If so, why?
> > I think maybe we can get rid of them. Can someone (Steve Whitehouse
> > perhaps?) think of a scenario in which they're still needed? If so,
> > please elaborate and give an example.
(snip)
> You need to ensure that the blocks cannot be reused in the same
> transaction (thats true of all metadata blocks, not just inodes) in
> order that recovery will work correctly. You cannot just eliminate the
> bitmaps without adding a mechanism to prevent this reuse,
> 
> Steve.

I don't see how it's possible for a transaction to reuse the same blocks,
even when transactions are combined.

As you know, GFS2 (unlike GFS1) marks only one type of metadata in its
bitmaps, and that's for dinode blocks. Any other metadata associated with
a dinode are marked as data blocks in the bitmap, and they remain marked
as such until freed. So if you have a process that truncates a file,
for example, and transitions its blocks from data to free, then searches,
finds and reallocates those blocks as data again, there would still only
be one copy of the bitmap buffer data in the ail lists, right?
And it should always reflect the most recent status of those bits, which
is data, right? So a journal replay will still replay the latest known
version of those bitmaps.

If a dinode references indirect blocks (marked as data) then
truncates the file to 0, the indirect blocks still remain because
the metadata for indirect blocks is never shrunk.

If the dinode is unlinked rather than deleted, its indirect blocks and
data blocks will all remain "data" until the inode is actually evicted.
When the inode is evicted and those blocks actually freed, that's all
done in separate transactions as per Andreas's "shrinker" patches, and
we know those don't search for free blocks to assign.

If a dinode is unlinked, and someone goes after free blocks, they won't
find those blocks anyway because they're still not "free" until the inode
is evicted. And, of course, the only process that searches the bitmaps
for unlinked blocks is the eviction process itself (which actually does
something with them) and inplace_reserve, which just tries to kick
off a potential eviction (but never actually does an eviction itself).

It's a little bit different with directories, because the hash table
is kind of data and kind of metadata, but even so, we don't ever shrink
the directory hash tables nor free leaf blocks or leaf continuation
blocks, as per bz#223783 (which suggests we might want to in the future.)
The clones today cost us file fragmentation, file system fragmentation,
and performance required to do kmalloc/kfrees, and twice as much work
setting and clearing bits, so I question whether the savings in
shrinking hash tables or freeing unused continuation leafs outweigh the
potential savings we might get by eliminating the bitmap clones. 

Again, I don't see a scenario that can get us into trouble, even with
journal replay.

Perhaps I should be worried about extended attributes that are freed
and reused? I'll look into that.

Bob




More information about the Cluster-devel mailing list