[dm-devel] [PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
Vivek Goyal
vgoyal at redhat.com
Mon Dec 13 16:17:49 UTC 2021
On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:
> On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal at redhat.com> wrote:
> >
> > On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote:
> > > These methods indirect the actual DAX read/write path. In the end pmem
> > > uses magic flush and mc safe variants and fuse and dcssblk use plain ones
> > > while device mapper picks redirects to the underlying device.
> > >
> > > Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these
> > > special variants, then use them everywhere as they fall back to the plain
> > > ones on s390 anyway and remove an indirect call from the read/write path
> > > as well as a lot of boilerplate code.
> > >
> > > Signed-off-by: Christoph Hellwig <hch at lst.de>
> > > ---
> > > drivers/dax/super.c | 36 ++++++++++++++--
> > > drivers/md/dm-linear.c | 20 ---------
> > > drivers/md/dm-log-writes.c | 80 -----------------------------------
> > > drivers/md/dm-stripe.c | 20 ---------
> > > drivers/md/dm.c | 50 ----------------------
> > > drivers/nvdimm/pmem.c | 20 ---------
> > > drivers/s390/block/dcssblk.c | 14 ------
> > > fs/dax.c | 5 ---
> > > fs/fuse/virtio_fs.c | 19 +--------
> > > include/linux/dax.h | 9 ++--
> > > include/linux/device-mapper.h | 4 --
> > > 11 files changed, 37 insertions(+), 240 deletions(-)
> > >
> >
> > [..]
> > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > > index 5c03a0364a9bb..754319ce2a29b 100644
> > > --- a/fs/fuse/virtio_fs.c
> > > +++ b/fs/fuse/virtio_fs.c
> > > @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> > > return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
> > > }
> > >
> > > -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev,
> > > - pgoff_t pgoff, void *addr,
> > > - size_t bytes, struct iov_iter *i)
> > > -{
> > > - return copy_from_iter(addr, bytes, i);
> > > -}
> > > -
> > > -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev,
> > > - pgoff_t pgoff, void *addr,
> > > - size_t bytes, struct iov_iter *i)
> > > -{
> > > - return copy_to_iter(addr, bytes, i);
> > > -}
> > > -
> > > static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> > > pgoff_t pgoff, size_t nr_pages)
> > > {
> > > @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev,
> > >
> > > static const struct dax_operations virtio_fs_dax_ops = {
> > > .direct_access = virtio_fs_direct_access,
> > > - .copy_from_iter = virtio_fs_copy_from_iter,
> > > - .copy_to_iter = virtio_fs_copy_to_iter,
> > > .zero_page_range = virtio_fs_zero_page_range,
> > > };
> > >
> > > @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
> > > fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
> > > if (IS_ERR(fs->dax_dev))
> > > return PTR_ERR(fs->dax_dev);
> > > -
> > > + set_dax_cached(fs->dax_dev);
> >
> > Looks good to me from virtiofs point of view.
> >
> > Reviewed-by: Vivek Goyal <vgoyal at redhat.com>
> >
> > Going forward, I am wondering should virtiofs use flushcache version as
> > well. What if host filesystem is using DAX and mapping persistent memory
> > pfn directly into qemu address space. I have never tested that.
> >
> > Right now we are relying on applications to do fsync/msync on virtiofs
> > for data persistence.
>
> This sounds like it would need coordination with a paravirtualized
> driver that can indicate whether the host side is pmem or not, like
> the virtio_pmem driver.
Agreed. Let me check the details of virtio_pmem driver.
> However, if the guest sends any fsync/msync
> you would still need to go explicitly cache flush any dirty page
> because you can't necessarily trust that the guest did that already.
So host dax functionality will already take care of that, IIUC, right?
I see a dax_flush() call in dax_writeback_one(). I am assuming that's
the will take care of flushing dirty pages when guest issues
fsync()/msync(). So probably don't have to do anything extra here.
I think qemu should map files using MAP_SYNC though in this case though.
Any read/writes to virtiofs files will turn into host file load/store
operations. So flushcache in guest makes more sense with MAP_SYNC which
should make sure any filesystem metadata will already persist after
fault completion. And later guest can do writes followed by flush and
ensure data persists too.
IOW, I probably only need to do following.
- In virtiofs virtual device, add a notion of kind of dax window or memory
it supports. So may be some kind of "writethrough" property of virtiofs
dax cache.
- Use this property in virtiofs driver to decide whether to use
plain copy_from_iter() or _copy_from_iter_flushcache().
- qemu should use mmap(MAP_SYNC) if host filesystem is on persistent
memory.
Thanks
Vivek
More information about the dm-devel
mailing list