[Cluster-devel] [GFS2 PATCH] GFS2: Don't brelse rgrp buffer_heads every allocation
Bob Peterson
rpeterso at redhat.com
Fri Jun 12 19:50:34 UTC 2015
----- Original Message -----
> Hi,
>
>
> On 09/06/15 15:45, Bob Peterson wrote:
> > ----- Original Message -----
> >> Hi,
> >>
> >>
> >> On 05/06/15 15:49, Bob Peterson wrote:
> >>> Hi,
> >>>
> >>> This patch allows the block allocation code to retain the buffers
> >>> for the resource groups so they don't need to be re-read from buffer
> >>> cache with every request. This is a performance improvement that's
> >>> especially noticeable when resource groups are very large. For
> >>> example, with 2GB resource groups and 4K blocks, there can be 33
> >>> blocks for every resource group. This patch allows those 33 buffers
> >>> to be kept around and not read in and thrown away with every
> >>> operation. The buffers are released when the resource group is
> >>> either synced or invalidated.
> >> The blocks should be cached between operations, so this should only be
> >> resulting in a skip of the look up of the cached block, and no changes
> >> to the actual I/O. Does that mean that grab_cache_page() is slow I
> >> wonder? Or is this an issue of going around the retry loop due to lack
> >> of memory at some stage?
> >>
> >> How does this interact with the rgrplvb support? I'd guess that with
> >> that turned on, this is no longer an issue, because we'd only read in
> >> the blocks for the rgrps that we are actually going to use?
> >>
> >>
> >>
> >> Steve.
> > Hi,
> >
> > If you compare the two vmstat outputs in the bugzilla #1154782, you'll
> > see no significant difference in memory usage nor cpu usage. So I assume
> > the page lookup is the "slow" part; not because it's such a slow thing
> > but because it's done 33 times per read-reference-invalidate (33 pages
> > to look up per rgrp).
> >
> > Regards,
> >
> > Bob Peterson
> > Red Hat File Systems
>
> Thats true, however, as I understand the problem here, the issue is not
> reading in the blocks for the rgrp that is eventually selected to use,
> but the reading in of those blocks for the rgrps that we reject, for
> whatever reason (full, or congested, or whatever). So with rgrplvb
> enabled, we don't then read those rgrps in off disk at all in most cases
> - so I was wondering whether that solves the problem without needing
> this change?
>
> Ideally I'd like to make the rgrplvb setting the default, since it is
> much more efficient. The question is how we can do that and still remain
> backward compatible? Not an easy one to answer :(
>
> Also, if the page lookup is the slow thing, then we should look at using
> pagevec_lookup() to get the pages in chunks rather than doing it
> individually (and indeed, multiple times per page, in case of block size
> less than page size). We know that the blocks will always be contiguous
> on disk, so we should be able to send down large I/Os, rather than
> relying on the block stack to merge them as we do at the moment, which
> should be a further improvement too,
>
> Steve.
Hi,
The rgrplvb mount option only helps if the file system is using lock_dlm.
For lock_nolock, it's still just as slow because lock_nolock has no knowledge
of lvbs. Now, granted, that's an unusual case because GFS2 is normally used
with lock_dlm.
I like the idea of making rgrplvb the default mount option, and I don't
see a problem doing that.
I think the rgrplvb option should be compatible with this patch, but
I'll set up a test environment in order to test that they work together
harmoniously.
I also like the idea of using a pagevec for reading in multiple pages for
the rgrps, but that's another improvement for another day. If there's
not a bugzilla record open for that, perhaps we should open one.
Regards,
Bob Peterson
Red Hat File Systems
More information about the Cluster-devel
mailing list