[Cluster-devel] [GFS2 PATCH] GFS2: Don't brelse rgrp buffer_heads every allocation

Fri Jun 12 19:50:34 UTC 2015

----- Original Message -----
> Hi,
> 
> 
> On 09/06/15 15:45, Bob Peterson wrote:
> > ----- Original Message -----
> >> Hi,
> >>
> >>
> >> On 05/06/15 15:49, Bob Peterson wrote:
> >>> Hi,
> >>>
> >>> This patch allows the block allocation code to retain the buffers
> >>> for the resource groups so they don't need to be re-read from buffer
> >>> cache with every request. This is a performance improvement that's
> >>> especially noticeable when resource groups are very large. For
> >>> example, with 2GB resource groups and 4K blocks, there can be 33
> >>> blocks for every resource group. This patch allows those 33 buffers
> >>> to be kept around and not read in and thrown away with every
> >>> operation. The buffers are released when the resource group is
> >>> either synced or invalidated.
> >> The blocks should be cached between operations, so this should only be
> >> resulting in a skip of the look up of the cached block, and no changes
> >> to the actual I/O. Does that mean that grab_cache_page() is slow I
> >> wonder? Or is this an issue of going around the retry loop due to lack
> >> of memory at some stage?
> >>
> >> How does this interact with the rgrplvb support? I'd guess that with
> >> that turned on, this is no longer an issue, because we'd only read in
> >> the blocks for the rgrps that we are actually going to use?
> >>
> >>
> >>
> >> Steve.
> > Hi,
> >
> > If you compare the two vmstat outputs in the bugzilla #1154782, you'll
> > see no significant difference in memory usage nor cpu usage. So I assume
> > the page lookup is the "slow" part; not because it's such a slow thing
> > but because it's done 33 times per read-reference-invalidate (33 pages
> > to look up per rgrp).
> >
> > Regards,
> >
> > Bob Peterson
> > Red Hat File Systems
> 
> Thats true, however, as I understand the problem here, the issue is not
> reading in the blocks for the rgrp that is eventually selected to use,
> but the reading in of those blocks for the rgrps that we reject, for
> whatever reason (full, or congested, or whatever). So with rgrplvb
> enabled, we don't then read those rgrps in off disk at all in most cases
> - so I was wondering whether that solves the problem without needing
> this change?
> 
> Ideally I'd like to make the rgrplvb setting the default, since it is
> much more efficient. The question is how we can do that and still remain
> backward compatible? Not an easy one to answer :(
> 
> Also, if the page lookup is the slow thing, then we should look at using
> pagevec_lookup() to get the pages in chunks rather than doing it
> individually (and indeed, multiple times per page, in case of block size
> less than page size). We know that the blocks will always be contiguous
> on disk, so we should be able to send down large I/Os, rather than
> relying on the block stack to merge them as we do at the moment, which
> should be a further improvement too,
> 
> Steve.

Hi,

The rgrplvb mount option only helps if the file system is using lock_dlm.
For lock_nolock, it's still just as slow because lock_nolock has no knowledge
of lvbs. Now, granted, that's an unusual case because GFS2 is normally used
with lock_dlm.

I like the idea of making rgrplvb the default mount option, and I don't
see a problem doing that.

I think the rgrplvb option should be compatible with this patch, but
I'll set up a test environment in order to test that they work together
harmoniously.

I also like the idea of using a pagevec for reading in multiple pages for
the rgrps, but that's another improvement for another day. If there's
not a bugzilla record open for that, perhaps we should open one.

Regards,

Bob Peterson
Red Hat File Systems