[Virtio-fs] [PATCH 3/4] virtiofsd: use file-backend memory region for virtiofsd's cache area
Dr. David Alan Gilbert
dgilbert at redhat.com
Mon May 20 17:58:07 UTC 2019
* Liu Bo (bo.liu at linux.alibaba.com) wrote:
> On Fri, Apr 26, 2019 at 10:05:24AM +0100, Stefan Hajnoczi wrote:
> > On Thu, Apr 25, 2019 at 05:21:58PM -0400, Vivek Goyal wrote:
> > > On Thu, Apr 25, 2019 at 03:33:23PM +0100, Stefan Hajnoczi wrote:
> > > > On Tue, Apr 23, 2019 at 11:49:15AM -0700, Liu Bo wrote:
> > > > > On Tue, Apr 23, 2019 at 01:09:19PM +0100, Stefan Hajnoczi wrote:
> > > > > > On Wed, Apr 17, 2019 at 03:51:21PM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > * Liu Bo (bo.liu at linux.alibaba.com) wrote:
> > > > > > > > From: Xiaoguang Wang <xiaoguang.wang at linux.alibaba.com>
> > > > > > > >
> > > > > > > > When running xfstests test case generic/413, we found such issue:
> > > > > > > > 1, create a file in one virtiofsd mount point with dax enabled
> > > > > > > > 2, mmap this file, get virtual addr: A
> > > > > > > > 3, write(fd, A, len), here fd comes from another file in another
> > > > > > > > virtiofsd mount point without dax enabled, also note here write(2)
> > > > > > > > is direct io.
> > > > > > > > 4, this direct io will hang forever, because the virtiofsd has crashed.
> > > > > > > > Here is the stack:
> > > > > > > > [ 247.166276] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > > [ 247.167171] t_mmap_dio D 0 2335 2102 0x00000000
> > > > > > > > [ 247.168006] Call Trace:
> > > > > > > > [ 247.169067] ? __schedule+0x3d0/0x830
> > > > > > > > [ 247.170219] schedule+0x32/0x80
> > > > > > > > [ 247.171328] schedule_timeout+0x1e2/0x350
> > > > > > > > [ 247.172416] ? fuse_direct_io+0x2e5/0x6b0 [fuse]
> > > > > > > > [ 247.173516] wait_for_completion+0x123/0x190
> > > > > > > > [ 247.174593] ? wake_up_q+0x70/0x70
> > > > > > > > [ 247.175640] fuse_direct_IO+0x265/0x310 [fuse]
> > > > > > > > [ 247.176724] generic_file_read_iter+0xaa/0xd20
> > > > > > > > [ 247.177824] fuse_file_read_iter+0x81/0x130 [fuse]
> > > > > > > > [ 247.178938] ? fuse_simple_request+0x104/0x1b0 [fuse]
> > > > > > > > [ 247.180041] ? fuse_fsync_common+0xad/0x240 [fuse]
> > > > > > > > [ 247.181136] __vfs_read+0x108/0x190
> > > > > > > > [ 247.181930] vfs_read+0x91/0x130
> > > > > > > > [ 247.182671] ksys_read+0x52/0xc0
> > > > > > > > [ 247.183454] do_syscall_64+0x55/0x170
> > > > > > > > [ 247.184200] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > > >
> > > > > > > > And virtiofsd crashed because vu_gpa_to_va() can not handle guest physical
> > > > > > > > address correctly. For a memory mapped area in dax mode, indeed the page
> > > > > > > > for this area points virtiofsd's cache area, or rather virtio pci device's
> > > > > > > > cache bar. In qemu, currently this cache bar is implemented with an anonymous
> > > > > > > > memory and will not pass this cache bar's address info to vhost-user backend,
> > > > > > > > so vu_gpa_to_va() will fail.
> > > > > > > >
> > > > > > > > To fix this issue, we create this vhost cache area with a file backend
> > > > > > > > memory area.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > I know there was another case of the daemon trying to access the
> > > > > > > buffer that Stefan and Vivek hit, but fixed by persuading the kernel
> > > > > > > not to do it; Stefan/Vivek: What do you think?
> > > > > >
> > > > > > That case happened with cache=none and the dax mount option.
> > > > > >
> > > > > > The general problem is when FUSE_READ/FUSE_WRITE is sent and the buffer
> > > > > > is outside guest RAM.
> > >
> > > Stefan,
> > >
> > > Can this be emulated by sending a request to qemu? If virtiofsd can detect
> > > that source/destination of READ/WRITE is not guest RAM, can it forward
> > > message to qemu to do this operation (which has access to all the DAX
> > > windows)?
> > >
> > > This probably will mean introducing new messages like
> > > setupmapping/removemapping messages between virtiofsd/qemu.
> >
> > Yes, interesting idea!
> >
> > When virtiofsd is unable to map the virtqueue iovecs due to addresses
> > outside guest RAM, it could forward READ/WRITE requests to QEMU along
> > with the file descriptor. It would be slow but fixes the problem.
> >
>
> It is probably not easy to do.
>
> Imagine the following case,
> // foo1 is on a dax virtiofs, foo2 is on a nondax virtiofs
>
> p = mmap(foo1, ...);
> write(foo2, p, ...);
>
> virtiofsd where foo2 is using needs to interpret gpa from virtiofs
> where foo1 exists along with fd being foo1, but a write fuse_req
> doesn't have foo1's fd.
>
> And are you suggesting that qemu goes to read the data on gpa and
> returns via vhost-user message? or let this virtiofsd (foo2) do mmap
> on foo1 directly?
I have a patchset I'm just tidying up that passes this case back to qemu
to handle. I intend to post it by the end of the week.
What it does is that when the virtiofsd receives a read/write to an area
of memory that it doesn't have a mapping for, it forms a new slave
message back to qemu together with the fd asking qemu to read/write at
the given GPA. Then it's upto QEMU to deal with it.
That should work even if there are two separate daemons.
It's not a pretty solution; but I think it should work.
Dave
> thanks,
> -liubo
>
> > Implementing this is a little tricky because the libvhost-user code
> > probably fails before fuse_lowlevel.c is able to parse the FUSE request
> > header. It will require reworking libvhost-user and fuse_virtio.c code,
> > I think.
> >
> > Stefan
--
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK
More information about the Virtio-fs
mailing list