[Virtio-fs] [PATCH] Virtiofs: fix null pointer deference in directIO

Liu Bo bo.liu at linux.alibaba.com
Thu Jul 18 23:39:12 UTC 2019


On Thu, Jul 18, 2019 at 04:51:10PM -0400, Vivek Goyal wrote:
> On Thu, Jul 18, 2019 at 11:58:01AM -0700, Liu Bo wrote:
> > On Thu, Jul 18, 2019 at 02:18:39PM -0400, Vivek Goyal wrote:
> > > On Fri, Jul 12, 2019 at 06:35:42AM +0800, Liu Bo wrote:
> > > > Virtiofs has used memremap to create page structs for its managed memory
> > > > and set pagemap's type to MEMORY_DEVICE_FS_DAX.  Thus, the page from
> > > > virtiofs's managed memory is a ZONE_DEVICE page, if %devmap_managed_key
> > > > was enabled by pmem drivers, put_page() will run into
> > > > __put_devmap_managed_page().  However, as page_free ops is not defined, it
> > > > ends up with a NULL pointer dereference.
> > > > 
> > > > As virtiofs presents its managed memory range as a dax device, this patch
> > > > makes virtiofs mimic how nvdimm/pmem.c setups pagemap fsdax.
> > > > 
> > > > =================
> > > > 
> > > > [ 1223.732964] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> > > > [ 1223.733043] PGD 800000026bd91067 P4D 800000026bd91067 PUD 27631e067 PMD 0
> > > > [ 1223.733085] Oops: 0010 [#1] SMP PTI
> > > > [ 1223.733114] CPU: 1 PID: 16505 Comm: build Not tainted 4.19.48-dff9509 #1
> > > > [ 1223.733157] RIP: 0010:          (null)
> > > > [ 1223.733179] Code: Bad RIP value.
> > > > ..
> > > > [ 1223.733671] Call Trace:
> > > > [ 1223.733704]  fuse_release_user_pages+0xad/0xb0
> > > > [ 1223.733751]  fuse_direct_io+0x3ec/0x6a0
> > > > [ 1223.733823]  fuse_dax_direct_write+0x3c/0x70
> > > > [ 1223.733863]  fuse_file_write_iter+0x2ea/0x4c0
> > > > [ 1223.733978]  __vfs_write+0xde/0x160
> > > > [ 1223.734009]  vfs_write+0xab/0x190
> > > > [ 1223.734061]  ksys_write+0x45/0xc0
> > > > [ 1223.734094]  do_syscall_64+0x6b/0x330
> > > > [ 1223.734214]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > [ 1223.734261] RIP: 0033:0x15536bcddfd0
> > > > 
> > > > The reproducer could be as simply as,
> > > > 
> > > > addr = mmap(..., fileA, ...);
> > > > fdB = open(fileB, O_DIRECT);
> > > > write(fdB, addr);
> > > > 
> > > > Signed-off-by: Liu Bo <bo.liu at linux.alibaba.com>
> > > 
> > > I am looking at this now and trying to understand it better. One problem
> > > I see is with page pinning. If a page is pinned, how do we make sure it
> > > is not evicted by reclaim logic. IIUC, right now memory range relciam
> > > logic is not aware about page pinning.
> > > 
> > > For instance, in above example, write(fdB, addr) will pin pages in fileA.
> > > While write is going on we need to make sure dax memory ranges covering
> > > those pages are not reclaimed.
> > 
> > Good question, it is indeed a problem, i_mmap_sem is not held by page
> > pinning so that dax range reclaim is free to go, nor page pinning in
> > within iomap_begin() and iomap_end() pair.
> > 
> > While that is a problem, this patch is fixing a different one because
> > of the lack of ->page_free definition.
> 
> Sure, but in the definition of ->page_free, you are calling
> "wake_up_var(page->_refcount)". This will make sense only if somebody
> is waiting on this.
>

Right, looks like we also need to follow this patch[1], but looks like
the fix for the whole problem is still WIP[2].

> I am checking xfs code and it seems to be waiting for this. We probably
> will have to modify virtio-fs to fix get_user_pages()/DMA races.
> 
> So give me some time to go through and understand what xfs/dax/mm stack
> is doing for these races and try to fix it properly. 

Right, I didn't realize the race with dax mapping reclaim.

So there're two things to fix,
a) the race between pinned entries/pages and fs fallocate/truncate.
b) the race between pinned entries/pages and dax ranges background reclaim.

In terms of a), page->_refcount can be used to avoid the race between
dma and dax.

a "dmap->refcnt--" is actually what we need to add for b) at ->page_free.


[1]: "mm, fs, dax: handle layout changes to pinned dax mappings"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fac7408d828719db6d3fdba63e3c3726a6d1ee5

[2]: "mm: introduce put_user_page*(), placeholder versions"
https://patchwork.kernel.org/patch/10637929/

thanks,
-liubo




More information about the Virtio-fs mailing list