[Virtio-fs] [PATCH 6/9] virtio-fs: let dax style override directIO style when dax+cache=none

Vivek Goyal vgoyal at redhat.com
Mon Apr 22 18:14:36 UTC 2019


On Wed, Apr 17, 2019 at 10:25:53AM +0200, Miklos Szeredi wrote:
> On Tue, Apr 16, 2019 at 9:38 PM Vivek Goyal <vgoyal at redhat.com> wrote:
> >
> > On Wed, Apr 17, 2019 at 02:03:19AM +0800, Liu Bo wrote:
> > > In case of dax+cache=none, mmap uses dax style prior to directIO style,
> > > while read/write don't, but it seems that there is no reason not to do so.
> > >
> > > Signed-off-by: Liu Bo <bo.liu at linux.alibaba.com>
> > > Reviewed-by: Joseph Qi <joseph.qi at linux.alibaba.com>
> >
> > This is interesting. I was thinking about it today itself. I noticed
> > that ext4 and xfs also check for DAX inode first and use dax path
> > if dax is enabled.
> >
> > cache=never sets FOPEN_DIRECT_IO (even if application never asked for
> > direct IO). If dax is enabled, for data its equivalent to doing direct
> > IO. And for mmap() we are already checking for DAX first. So it makes
> > sense to do same thing for read/write path as well.
> >
> > CCing Miklos as well. He might have some thougts on this. I am curios
> > that initially whey did he make this change only for mmap() and not
> > for read/write paths.
> 
> AFAIR the main reason was that we had performance issues with size
> extending writes with dax.

Hi Miklos,

How about falling back to sending fuse WRITE message to fuse daemon
in case of extending writes (and not use DAX). That will solve the issue
of write and i_size modification not being atomic also will improve
performance of file extending writes. 

I wrote following hack patch and this seems to work.

Thanks
Vivek


Subject: fuse, dax: Do not use dax for file extnding writes

Right not we use dax for file extending writes and use fallocate() first
to make sure file on host is extended so that later write through mmap()
will not result in SIGBUS.

This approach has two problems. First of all write and i_size are not atomic.
And that means if guest crashes after fallocate(), it can expose trailing
zeros in file and give the apperance as if user data was lost. Secondly,
calling falloate() for every extending write is very slow.  Apart from
communication overhead, fallocate() on host is much slower than calling
write on host.

So do not use dax for file extending writes, instead just send WRITE message
to daemon (like we do for direct I/O path) and this should solve botht the
above problems.

Signed-off-by: Vivek Goyal <vgoyal at redhat.com>
---
 fs/fuse/file.c |   67 ++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 17 deletions(-)

Index: rhvgoyal-linux-fuse/fs/fuse/file.c
===================================================================
--- rhvgoyal-linux-fuse.orig/fs/fuse/file.c	2019-04-18 16:08:19.048407845 -0400
+++ rhvgoyal-linux-fuse/fs/fuse/file.c	2019-04-22 13:22:20.939679793 -0400
@@ -1809,6 +1809,13 @@ static int fuse_iomap_begin(struct inode
 	pr_debug("fuse_iomap_begin() called. pos=0x%llx length=0x%llx\n",
 			pos, length);
 
+	/*
+	 * Writes beyond end of file are not handled using dax path. Instead
+	 * a fuse write message is sent to daemon
+	 */
+	if (flags & IOMAP_WRITE && pos >= i_size_read(inode))
+		return -EIO;
+
 	iomap->offset = pos;
 	iomap->flags = 0;
 	iomap->bdev = NULL;
@@ -1929,11 +1936,34 @@ static ssize_t fuse_dax_read_iter(struct
 	return ret;
 }
 
-static ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
+static bool file_extending_write(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+
+	return (iov_iter_rw(from) == WRITE &&
+	    	((iocb->ki_pos) >= i_size_read(inode)));
+}
+
+static ssize_t fuse_dax_direct_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
+	struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
 	ssize_t ret;
 
+	ret = fuse_direct_io(&io, from, &iocb->ki_pos, FUSE_DIO_WRITE);
+	if (ret < 0)
+		return ret;
+
+	fuse_invalidate_attr(inode);
+	fuse_write_update_size(inode, iocb->ki_pos);
+	return ret;
+}
+
+static ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+	ssize_t ret, count;
+
 	if (iocb->ki_flags & IOCB_NOWAIT) {
 		if (!inode_trylock(inode))
 			return -EAGAIN;
@@ -1950,26 +1980,29 @@ static ssize_t fuse_dax_write_iter(struc
 		goto out;
 	/* TODO file_update_time() but we don't want metadata I/O */
 
-	/* TODO handle growing the file */
-	/* Grow file here if need be. iomap_begin() does not have access
-	 * to file pointer
-	 */
-	if (iov_iter_rw(from) == WRITE &&
-	    ((iocb->ki_pos + iov_iter_count(from)) > i_size_read(inode))) {
-		ret = __fuse_file_fallocate(iocb->ki_filp, 0, iocb->ki_pos,
-						iov_iter_count(from));
-		if (ret < 0) {
-			printk("fallocate(offset=0x%llx length=0x%zx)"
-			" failed. err=%zd\n", iocb->ki_pos,
-			iov_iter_count(from), ret);
-			goto out;
-		}
-		pr_debug("fallocate(offset=0x%llx length=0x%zx)"
-		" succeed. ret=%zd\n", iocb->ki_pos, iov_iter_count(from), ret);
+	/* Do not use dax for file extending writes as its an mmap and
+	 * trying to write beyong end of existing page will generate
+	 * SIGBUS.
+	 */
+	if (file_extending_write(iocb, from)) {
+		ret = fuse_dax_direct_write(iocb, from);
+		goto out;
 	}
 
 	ret = dax_iomap_rw(iocb, from, &fuse_iomap_ops);
+	if (ret < 0)
+		goto out;
 
+	/*
+	 * If part of the write was file extending, fuse dax path will not
+	 * take care of that. Do direct write instead.
+	 */
+	if (iov_iter_count(from) && file_extending_write(iocb, from)) {
+		count = fuse_dax_direct_write(iocb, from);
+		if (count < 0)
+			goto out;
+		ret += count;
+	}
 out:
 	inode_unlock(inode);
 




More information about the Virtio-fs mailing list