[Linux-cachefs] Problems doing DIO to netfs cache on XFS from Ceph
David Howells
dhowells at redhat.com
Thu Dec 3 14:10:56 UTC 2020
Hi Christoph,
We're having a problem making the fscache/cachefiles rewrite work with XFS, if
you could have a look? Jeff Layton just tripped the attached warning from
this:
/*
* Given that we do not allow direct reclaim to call us, we should
* never be called in a recursive filesystem reclaim context.
*/
if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
goto redirty;
The chain of events is the following:
(1) Ceph is asked to do an ordinary write by userspace. It calls the fscache
netfs_write_begin() helper to read the region it's going to modify so
that the cache can be preloaded.
(2) In this case, the cache already has it, so cachefiles_read() dispatches
an async DIO read to the backing filesystem (in this case XFS).
(3) iomap, on behalf of XFS, flushes the pagecache attached to the backing
inode, which appears to be populated, causing do_writepages() to run.
(4) The XFS write-out eventually wends its way to iomap_do_writepage(), which
complains about NOFS being set and cancels the write.
Now, I'm doing:
old_nofs = memalloc_nofs_save();
ret = call_read_iter(file, &ki->iocb, iter);
memalloc_nofs_restore(old_nofs);
in cachefiles_read() to prevent the cache causing writeout in the netfs to
occur. Possibly overriding NOFS here is overkill and is only really necessary
in cachefiles_write() - which can be called from netfs writeback.
cachefiles_read() should only be called from netfs ->readpage(), ->readahead()
and ->write_begin() and maybe a workqueue in the case that the cache returns a
short read.
Note that I'm only doing async DIO reads and writes, so I was a bit surprised
that XFS is doing a writeback at all - but I guess that IOCB_DIRECT is
actually just a hint and the filesystem can turn it into buffered I/O if it
wants.
Thanks,
David
---
WARNING: CPU: 6 PID: 7412 at fs/iomap/buffered-io.c:1465 iomap_do_writepage+0x76a/0x8b0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
RIP: 0010:iomap_do_writepage+0x76a/0x8b0
Code: 89 f5 41 89 c7 48 83 7d 48 00 0f 85 6e fb ff ff 48 8b 44 24 48 48 8d 5c 24 48 48 39 d8 0f 84 5b fb ff ff 0f 0b e9 54 fb ff ff <0f> 0b e9 76 ff ff ff 0f 0b e9 64 fb ff ff 0f 0b e9 9a fb ff ff 0f
RSP: 0018:ffffb19b4155f6e0 EFLAGS: 00010206
RAX: 0000000000440100 RBX: ffffb19b4155f7a8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffb19b4155f940 RDI: ffffd1e484446740
RBP: ffffb19b4155f868 R08: ffffffffffffffff R09: 0000000000030360
R10: 0000000000000002 R11: 0000000000000006 R12: ffff8a5108ad4d30
R13: 0000000000002a9a R14: ffffb19b4155f7b0 R15: ffffd1e484446740
FS: 00007f6ff479d740(0000) GS:ffff8a542fb80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000fc02a8 CR3: 000000013b218000 CR4: 00000000003506e0
Call Trace:
? page_referenced_one+0x150/0x150
? __mod_memcg_lruvec_state+0x21/0xe0
? clear_page_dirty_for_io+0xf1/0x240
write_cache_pages+0x186/0x3d0
? iomap_readahead+0x1b0/0x1b0
? blk_mq_submit_bio+0x2ee/0x4f0
? elv_rb_del+0x1f/0x30
? deadline_remove_request+0x55/0xb0
? dd_dispatch_request+0x151/0x210
iomap_writepages+0x1c/0x40
xfs_vm_writepages+0x56/0x70 [xfs]
do_writepages+0x28/0xa0
? xfs_iunlock+0xa3/0xe0 [xfs]
? wbc_attach_and_unlock_inode+0xb5/0x140
__filemap_fdatawrite_range+0xa7/0xe0
filemap_write_and_wait_range+0x3d/0x90
__iomap_dio_rw+0x149/0x490
iomap_dio_rw+0xe/0x30
xfs_file_dio_aio_read+0xb9/0x100 [xfs]
xfs_file_read_iter+0xba/0xd0 [xfs]
cachefiles_read+0x1ee/0x3f0 [cachefiles]
? netfs_subreq_terminated+0x240/0x240 [netfs]
netfs_read_from_cache+0x70/0x80 [netfs]
netfs_rreq_submit_slice+0x169/0x310 [netfs]
netfs_write_begin+0x4e4/0x6a0 [netfs]
? ceph_put_fmode+0x43/0xd0 [ceph]
ceph_write_begin+0x141/0x250 [ceph]
generic_perform_write+0xaf/0x190
ceph_write_iter+0xab6/0xc90 [ceph]
? _cond_resched+0x16/0x40
? __ceph_setattr+0x895/0x960 [ceph]
? new_sync_write+0x108/0x180
new_sync_write+0x108/0x180
vfs_write+0x1bc/0x270
ksys_write+0x4f/0xc0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
More information about the Linux-cachefs
mailing list