[Cluster-devel] remove iomap_writepage v2

Thu Jul 28 23:26:20 UTC 2022

On Thu, Jul 28, 2022 at 3:48 PM Dave Chinner <david at fromorbit.com> wrote:
>
> On Thu, Jul 28, 2022 at 03:18:03PM +0100, Matthew Wilcox wrote:
> > On Thu, Jul 28, 2022 at 01:10:16PM +0200, Jan Kara wrote:
> > > Hi Christoph!
> > >
> > > On Tue 19-07-22 06:13:07, Christoph Hellwig wrote:
> > > > this series removes iomap_writepage and it's callers, following what xfs
> > > > has been doing for a long time.
> > >
> > > So this effectively means "no writeback from page reclaim for these
> > > filesystems" AFAICT (page migration of dirty pages seems to be handled by
> > > iomap_migrate_page()) which is going to make life somewhat harder for
> > > memory reclaim when memory pressure is high enough that dirty pages are
> > > reaching end of the LRU list. I don't expect this to be a problem on big
> > > machines but it could have some undesirable effects for small ones
> > > (embedded, small VMs). I agree per-page writeback has been a bad idea for
> > > efficiency reasons for at least last 10-15 years and most filesystems
> > > stopped dealing with more complex situations (like block allocation) from
> > > ->writepage() already quite a few years ago without any bug reports AFAIK.
> > > So it all seems like a sensible idea from FS POV but are MM people on board
> > > or at least aware of this movement in the fs land?
> >
> > I mentioned it during my folio session at LSFMM, but didn't put a huge
> > emphasis on it.
> >
> > For XFS, writeback should already be in progress on other pages if
> > we're getting to the point of trying to call ->writepage() in vmscan.
> > Surely this is also true for other filesystems?
>
> Yes.
>
> It's definitely true for btrfs, too, because btrfs_writepage does:
>
> static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
> {
>         struct inode *inode = page->mapping->host;
>         int ret;
>
>         if (current->flags & PF_MEMALLOC) {
>                 redirty_page_for_writepage(wbc, page);
>                 unlock_page(page);
>                 return 0;
>         }
> ....
>
> It also rejects all calls to write dirty pages from memory reclaim
> contexts.

Aha, it seems even kswapd (it has PF_MEMALLOC set) is rejected too.

>
> ext4 will also reject writepage calls from memory allocation if
> block allocation is required (due to delayed allocation) or
> unwritten extents need converting to written. i.e. if it has to run
> blocking transactions.
>
> So all three major filesystems will either partially or wholly
> reject ->writepage calls from memory reclaim context.
>
> IOWs, if memory reclaim is depending on ->writepage() to make
> reclaim progress, it's not working as advertised on the vast
> majority of production Linux systems....
>
> The reality is that ->writepage is a relic of a bygone era of OS and
> filesystem design. It was useful in the days where writing a dirty
> page just involved looking up the bufferhead attached to the page to
> get the disk mapping and then submitting it for IO.
>
> Those days are long gone - filesystems have complex IO submission
> paths now that have to handle delayed allocation, copy-on-write,
> unwritten extents, have unbound memory demand, etc. All the
> filesystems that support these 1990s era filesystem technologies
> simply turn off ->writepage in memory reclaim contexts.
>
> Hence for the vast majority of linux users (i.e. everyone using
> ext4, btrfs and XFS), ->writepage no longer plays any part in memory
> reclaim on their systems.
>
> So why should we try to maintain the fiction that ->writepage is
> required functionality in a filesystem when it clearly isn't?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
>