[Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode
Dr. David Alan Gilbert
dgilbert at redhat.com
Tue Mar 24 20:09:40 UTC 2020
* Liu Bo (bo.liu at linux.alibaba.com) wrote:
> On Fri, Mar 20, 2020 at 08:16:15PM +0000, Dr. David Alan Gilbert wrote:
> > * Liu Bo (bo.liu at linux.alibaba.com) wrote:
> > > When a file size is not aligned to PAGE_SIZE, a mmap write on it may
> > > encounter -EIO (can be observed from virtiofsd's log) due to the difference
> > > between the buf size and the size recorded in struct fuse_write_in. The
> > > difference comes from the fact that for mmap, writeback IO is used and
> > > guest kernel sets fuse_write_in's size to inode size if EOF, while the buf
> > > len still remains PAGE_SIZE aligned.
> > >
> > > This handles the above special mmap case by truncating the last buf'size.
> >
> > Thanks,
> >
> > > Fixes: Commit 469f9d2f ("virtiofsd: Plumb fuse_bufvec through do_write_buf")
> > > Reported-by: Yiqun Leng <yqleng at linux.alibaba.com>
> > > Signed-off-by: Liu Bo <bo.liu at linux.alibaba.com>
> > > ---
> > > tools/virtiofsd/fuse_lowlevel.c | 17 +++++++++++++++++
> > > 1 file changed, 17 insertions(+)
> > >
> > > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > > index ca2056f..4f8bfb6 100644
> > > --- a/tools/virtiofsd/fuse_lowlevel.c
> > > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > > @@ -1221,6 +1221,23 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
> > > * and the data in the rest, we need to skip that first element
> > > */
> > > ibufv->buf[0].size = 0;
> > > +
> > > + /*
> > > + * In case of mmap, fuse_buf_size(pbufv) may need to truncate if
> > > + * arg->size has been cropped by inode size inside guest. The
> > > + * diff can only be (0, PAGE_SIZE) because inode size must be
> > > + * overlapped with the last buf.
> > > + */
> > > + if (arg->write_flags & FUSE_WRITE_CACHE) {
> >
> > Does this need to only do it in the WRITE_CACHE case - or should we just
> > always truncate the write to arg->size?
> > Or is this just simpler?
>
> For non-mmap IO, AFAICS, it's all synchronous IO (not using
> writepages) where the data part's length should be equal to arg->size
> here. So I think it's no harm to do it for both.
OK, good that would simplify it.
> >
> > > + size_t total = fuse_buf_size(pbufv);
> > > + int last = ibufv->count - 1;
> > > +
> > > + if (total > arg->size) {
> > > + size_t diff = total - arg->size;
> > > + if (diff < ibufv->buf[last].size)
> > > + ibufv->buf[last].size -= diff;
> >
> > I think that needs to modify pbufv->buf[last].size not ibufv
> > because the two are only the same in some cases (although it's possible
> > in this case the guest we try at the moment always falls in this side).
> >
>
> OK.
>
> > We should also do something in the else case - probably fail?
> >
>
> If it fails, it then gets to the following check and report -EIO.
Yes, but that error doesn't tell us why; so we'll see that error and not
realise that it was because we failed to do this truncation.
> > > + }
> > > + }
> > > }
> > >
> > > if (fuse_buf_size(pbufv) != arg->size) {
> >
> > If we now know that pbufv is now always shrung to size,
> > then we only now need to check for the case where pbufv is too small.
> >
>
> From my understanding about both mmap IO and nonmmap IO, I think it's
> arg->size that is always <= pbufv size.
It should do, but this is a sanity check - remember we don't trust the
guest.
Dave
> thanks,
> -liubo
>
--
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK
More information about the Virtio-fs
mailing list