[Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode

Dr. David Alan Gilbert dgilbert at redhat.com
Tue Mar 24 20:09:40 UTC 2020


* Liu Bo (bo.liu at linux.alibaba.com) wrote:
> On Fri, Mar 20, 2020 at 08:16:15PM +0000, Dr. David Alan Gilbert wrote:
> > * Liu Bo (bo.liu at linux.alibaba.com) wrote:
> > > When a file size is not aligned to PAGE_SIZE, a mmap write on it may
> > > encounter -EIO (can be observed from virtiofsd's log) due to the difference
> > > between the buf size and the size recorded in struct fuse_write_in.  The
> > > difference comes from the fact that for mmap, writeback IO is used and
> > > guest kernel sets fuse_write_in's size to inode size if EOF, while the buf
> > > len still remains PAGE_SIZE aligned.
> > > 
> > > This handles the above special mmap case by truncating the last buf'size.
> > 
> > Thanks,
> > 
> > > Fixes: Commit 469f9d2f ("virtiofsd: Plumb fuse_bufvec through do_write_buf")
> > > Reported-by: Yiqun Leng <yqleng at linux.alibaba.com>
> > > Signed-off-by: Liu Bo <bo.liu at linux.alibaba.com>
> > > ---
> > >  tools/virtiofsd/fuse_lowlevel.c | 17 +++++++++++++++++
> > >  1 file changed, 17 insertions(+)
> > > 
> > > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > > index ca2056f..4f8bfb6 100644
> > > --- a/tools/virtiofsd/fuse_lowlevel.c
> > > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > > @@ -1221,6 +1221,23 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
> > >           * and the data in the rest, we need to skip that first element
> > >           */
> > >          ibufv->buf[0].size = 0;
> > > +
> > > +        /*
> > > +         * In case of mmap, fuse_buf_size(pbufv) may need to truncate if
> > > +         * arg->size has been cropped by inode size inside guest.  The
> > > +         * diff can only be (0, PAGE_SIZE) because inode size must be
> > > +         * overlapped with the last buf.
> > > +         */
> > > +        if (arg->write_flags & FUSE_WRITE_CACHE) {
> > 
> > Does this need to only do it in the WRITE_CACHE case - or should we just
> > always truncate the write to arg->size?
> > Or is this just simpler?
> 
> For non-mmap IO, AFAICS, it's all synchronous IO (not using
> writepages) where the data part's length should be equal to arg->size
> here.  So I think it's no harm to do it for both.

OK, good that would simplify it.

> > 
> > > +                size_t total = fuse_buf_size(pbufv);
> > > +                int last = ibufv->count - 1;
> > > +
> > > +                if (total > arg->size) {
> > > +                        size_t diff = total - arg->size;
> > > +                        if (diff < ibufv->buf[last].size)
> > > +                                ibufv->buf[last].size -= diff;
> > 
> > I think that needs to modify pbufv->buf[last].size not ibufv
> > because the two are only the same in some cases (although it's possible
> > in this case the guest we try at the moment always falls in this side).
> >
> 
> OK.
> 
> > We should also do something in the else case - probably fail?
> >
> 
> If it fails, it then gets to the following check and report -EIO.

Yes, but that error doesn't tell us why; so we'll see that error and not
realise that it was because we failed to do this truncation.

> > > +                }
> > > +        }
> > >      }
> > >  
> > >      if (fuse_buf_size(pbufv) != arg->size) {
> > 
> > If we now know that pbufv is now always shrung to size,
> > then we only now need to check for the case where pbufv is too small.
> >
> 
> From my understanding about both mmap IO and nonmmap IO, I think it's
> arg->size that is always <= pbufv size.

It should do, but this is a sanity check - remember we don't trust the
guest.

Dave

> thanks,
> -liubo
> 
--
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK




More information about the Virtio-fs mailing list