[Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode
Liu Bo
bo.liu at linux.alibaba.com
Fri Mar 20 22:33:00 UTC 2020
On Fri, Mar 20, 2020 at 08:16:15PM +0000, Dr. David Alan Gilbert wrote:
> * Liu Bo (bo.liu at linux.alibaba.com) wrote:
> > When a file size is not aligned to PAGE_SIZE, a mmap write on it may
> > encounter -EIO (can be observed from virtiofsd's log) due to the difference
> > between the buf size and the size recorded in struct fuse_write_in. The
> > difference comes from the fact that for mmap, writeback IO is used and
> > guest kernel sets fuse_write_in's size to inode size if EOF, while the buf
> > len still remains PAGE_SIZE aligned.
> >
> > This handles the above special mmap case by truncating the last buf'size.
>
> Thanks,
>
> > Fixes: Commit 469f9d2f ("virtiofsd: Plumb fuse_bufvec through do_write_buf")
> > Reported-by: Yiqun Leng <yqleng at linux.alibaba.com>
> > Signed-off-by: Liu Bo <bo.liu at linux.alibaba.com>
> > ---
> > tools/virtiofsd/fuse_lowlevel.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index ca2056f..4f8bfb6 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -1221,6 +1221,23 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
> > * and the data in the rest, we need to skip that first element
> > */
> > ibufv->buf[0].size = 0;
> > +
> > + /*
> > + * In case of mmap, fuse_buf_size(pbufv) may need to truncate if
> > + * arg->size has been cropped by inode size inside guest. The
> > + * diff can only be (0, PAGE_SIZE) because inode size must be
> > + * overlapped with the last buf.
> > + */
> > + if (arg->write_flags & FUSE_WRITE_CACHE) {
>
> Does this need to only do it in the WRITE_CACHE case - or should we just
> always truncate the write to arg->size?
> Or is this just simpler?
For non-mmap IO, AFAICS, it's all synchronous IO (not using
writepages) where the data part's length should be equal to arg->size
here. So I think it's no harm to do it for both.
>
> > + size_t total = fuse_buf_size(pbufv);
> > + int last = ibufv->count - 1;
> > +
> > + if (total > arg->size) {
> > + size_t diff = total - arg->size;
> > + if (diff < ibufv->buf[last].size)
> > + ibufv->buf[last].size -= diff;
>
> I think that needs to modify pbufv->buf[last].size not ibufv
> because the two are only the same in some cases (although it's possible
> in this case the guest we try at the moment always falls in this side).
>
OK.
> We should also do something in the else case - probably fail?
>
If it fails, it then gets to the following check and report -EIO.
> > + }
> > + }
> > }
> >
> > if (fuse_buf_size(pbufv) != arg->size) {
>
> If we now know that pbufv is now always shrung to size,
> then we only now need to check for the case where pbufv is too small.
>
>From my understanding about both mmap IO and nonmmap IO, I think it's
arg->size that is always <= pbufv size.
thanks,
-liubo
More information about the Virtio-fs
mailing list