[Virtio-fs] [PATCH 2/2] fuse: remove dmap when truncating inode

Tao Peng bergwolf at hyper.sh
Sat May 11 15:24:44 UTC 2019


On Sat, May 11, 2019 at 1:31 AM Vivek Goyal <vgoyal at redhat.com> wrote:
>
> On Sat, May 11, 2019 at 12:16:23AM +0800, Tao Peng wrote:
> > On Fri, May 10, 2019 at 11:23 PM Vivek Goyal <vgoyal at redhat.com> wrote:
> > >
> > > On Fri, May 10, 2019 at 06:29:57PM +0800, Peng Tao wrote:
> > > > The obseleted dmap entries can be put back to global free list
> > > > immediately and we don't have to rely on reclaim code to free them.
> > > >
> > > > Signed-off-by: Peng Tao <tao.peng at linux.alibaba.com>
> > > > ---
> > > >  fs/fuse/dir.c    |  5 ++++
> > > >  fs/fuse/file.c   | 69 +++++++++++++++++++++++++++++++++++++++---------
> > > >  fs/fuse/fuse_i.h |  2 ++
> > > >  3 files changed, 63 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > > > index 3f923fe7841a..233c3ed391f1 100644
> > > > --- a/fs/fuse/dir.c
> > > > +++ b/fs/fuse/dir.c
> > > > @@ -1751,6 +1751,11 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
> > > >               down_write(&fi->i_mmap_sem);
> > > >               truncate_pagecache(inode, outarg.attr.size);
> > > >               invalidate_inode_pages2(inode->i_mapping);
> > > > +             // Free dmap beyond i_size
> > > > +             if (IS_DAX(inode) && oldsize > outarg.attr.size)
> > > > +                     fuse_dax_free_mappings_range(fc, inode,
> > > > +                                                  outarg.attr.size, -1);
> > > > +
> > > >               up_write(&fi->i_mmap_sem);
> > > >       }
> > > >
> > > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > > > index 51faed351c7c..6d130f2f3a23 100644
> > > > --- a/fs/fuse/file.c
> > > > +++ b/fs/fuse/file.c
> > > > @@ -30,6 +30,8 @@ static long __fuse_file_fallocate(struct file *file, int mode,
> > > >                                       loff_t offset, loff_t length);
> > > >  static struct fuse_dax_mapping *alloc_dax_mapping_reclaim(struct fuse_conn *fc,
> > > >                                       struct inode *inode);
> > > > +static void fuse_dax_do_remove_mapping_locked(struct fuse_inode *fi,
> > > > +                                     struct fuse_dax_mapping *dmap);
> > > >
> > > >  static int fuse_send_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
> > > >                         int opcode, struct fuse_open_out *outargp)
> > > > @@ -3752,9 +3754,7 @@ int fuse_dax_reclaim_dmap_locked(struct fuse_conn *fc, struct inode *inode,
> > > >               return ret;
> > > >       }
> > > >
> > > > -     /* Remove dax mapping from inode interval tree now */
> > > > -     fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree);
> > > > -     fi->nr_dmaps--;
> > > > +     fuse_dax_do_remove_mapping_locked(fi, dmap);
> > > >       return 0;
> > > >  }
> > > >
> > > > @@ -3831,6 +3831,58 @@ static struct fuse_dax_mapping *alloc_dax_mapping_reclaim(struct fuse_conn *fc,
> > > >       }
> > > >  }
> > > >
> > > > +/* Cleanup dmap entry and add back to free list */
> > > > +static void fuse_dax_do_free_mapping_locked(struct fuse_conn *fc, struct fuse_dax_mapping *dmap)
> > > > +{
> > > > +     pr_debug("fuse: freed memory range start=0x%llx end=0x%llx "
> > > > +             "window_offset=0x%llx length=0x%llx\n", dmap->start,
> > > > +             dmap->end, dmap->window_offset, dmap->length);
> > > > +     spin_lock(&fc->lock);
> > > > +     __dmap_remove_busy_list(fc, dmap);
> > > > +     dmap->inode = NULL;
> > > > +     dmap->start = dmap->end = 0;
> > > > +     __free_dax_mapping(fc, dmap);
> > > > +     spin_unlock(&fc->lock);
> > > > +}
> > > > +
> > > > +/* Remove dax mapping from inode interval tree */
> > > > +static void fuse_dax_do_remove_mapping_locked(struct fuse_inode *fi, struct fuse_dax_mapping *dmap)
> > > > +{
> > > > +     fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree);
> > > > +     fi->nr_dmaps--;
> > > > +}
> > >
> > > this is just a two line funciton. I think lets open code it and we don't
> > > need a separate function for this.
> > ok.
> >
> > >
> > >
> > > > +
> > > > +/*
> > > > + * Free inode dmap entries whose range falls entirely inside [start, end].
> > > > + * Called with inode->i_rwsem and fuse_inode->i_mmap_sem held.
> > > > + *
> > > > + * No need to send FUSE_REMOVEMAPPING to userspace because the userspace mappings
> > > > + * will be overridden next time dmap is used -- same logic is applied in
> > > > + * fuse_dax_reclaim_first_mapping().
> > >
> > > This is true that sending FUSE_REMOVEMAPPING is not strictly needed. But,
> > > thinking more about it, what if range does not get used soon. Then it
> > > will keep resources unnecessarily busy on host? (fd, file, inode, dentry,
> > > vma etc).
> > >
> > > I think it probably is a good idea to send FUSE_REMOVEMAPPING to daemon
> > > when range is being freed.
> > We are under inode->i_rwsem and i_mmap_sem here. It can be very
> > expensive if we wait for userspace holding locks. How about pushing it
> > to a work queue? Still we at least need to grab the dmap lock but it
> > seems less expensive and doesn't block truncate.
>
> How about freeing multiple ranges in single removemapping call. I think
> that's allowed. We can look into making it async also at some point of
> time, but right now it probably is best to keep code as simple as
> possible. Do typical workloads do truncate at high frequencey?
>
Good point! I'll change to do it. Thanks!

Cheers,
Tao
-- 
bergwolf at hyper.sh




More information about the Virtio-fs mailing list