[Cluster-devel] gfs2 iomap dealock, IOMAP_F_UNBALANCED
Jan Kara
jack at suse.cz
Mon Apr 8 13:44:05 UTC 2019
On Mon 08-04-19 10:53:34, Andreas Gruenbacher wrote:
> On Sun, 7 Apr 2019 at 09:32, Christoph Hellwig <hch at lst.de> wrote:
> >
> > [adding Jan and linux-mm]
> >
> > On Fri, Mar 29, 2019 at 11:13:00PM +0100, Andreas Gruenbacher wrote:
> > > > But what is the requirement to do this in writeback context? Can't
> > > > we move it out into another context instead?
> > >
> > > Indeed, this isn't for data integrity in this case but because the
> > > dirty limit is exceeded. What other context would you suggest to move
> > > this to?
> > >
> > > (The iomap flag I've proposed would save us from getting into this
> > > situation in the first place.)
> >
> > Your patch does two things:
> >
> > - it only calls balance_dirty_pages_ratelimited once per write
> > operation instead of once per page. In the past btrfs did
> > hacks like that, but IIRC they caused VM balancing issues.
> > That is why everyone now calls balance_dirty_pages_ratelimited
> > one per page. If calling it at a coarse granularity would
> > be fine we should do it everywhere instead of just in gfs2
> > in journaled mode
> > - it artifically reduces the size of writes to a low value,
> > which I suspect is going to break real life application
>
> Not quite, balance_dirty_pages_ratelimited is called from iomap_end,
> so once per iomap mapping returned, not per write. (The first version
> of this patch got that wrong by accident, but not the second.)
>
> We can limit the size of the mappings returned just in that case. I'm
> aware that there is a risk of balancing problems, I just don't have
> any better ideas.
>
> This is a problem all filesystems with data-journaling will have with
> iomap, it's not that gfs2 is doing anything particularly stupid.
I agree that if ext4 would be using iomap, it would have similar issues.
> > So I really think we need to fix this properly. And if that means
> > that you can't make use of the iomap batching for gfs2 in journaled
> > mode that is still a better option.
>
> That would mean using the old-style, page-size allocations, and a
> completely separate write path in that case. That would be quite a
> nightmare.
>
> > But I really think you need
> > to look into the scope of your flush_log and figure out a good way
> > to reduce that as solve the root cause.
>
> We won't be able to do a log flush while another transaction is
> active, but that's what's needed to clean dirty pages. iomap doesn't
> allow us to put the block allocation into a separate transaction from
> the page writes; for that, the opposite to the page_done hook would
> probably be needed.
I agree that a ->page_prepare() hook would be probably the cleanest
solution for this.
Honza
--
Jan Kara <jack at suse.com>
SUSE Labs, CR
More information about the Cluster-devel
mailing list