[Virtio-fs] [fuse-devel] 'FORGET' ordering semantics (vs unlink & NFS)

Dr. David Alan Gilbert dgilbert at redhat.com
Thu Jan 7 20:10:31 UTC 2021


* Amir Goldstein (amir73il at gmail.com) wrote:
> On Thu, Jan 7, 2021 at 10:44 AM Miklos Szeredi <miklos at szeredi.hu> wrote:
> >
> > On Wed, Jan 6, 2021 at 5:58 PM Vivek Goyal <vgoyal at redhat.com> wrote:
> > >
> > > On Wed, Jan 06, 2021 at 02:40:45PM +0100, Miklos Szeredi wrote:
> >
> > > > @@ -901,6 +906,18 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > > >
> > > >      inode = lo_find(lo, &e->attr, mnt_id);
> > > >      if (inode) {
> > > > +        char buf1[PATH_MAX + 1], buf2[PATH_MAX + 1], procname[64];
> > > > +        ssize_t siz1, siz2;
> > > > +
> > > > +        sprintf(procname, "%i", inode->fd);
> > > > +        siz1 = readlinkat(lo->proc_self_fd, procname, buf1, sizeof(buf1));
> > > > +        sprintf(procname, "%i", newfd);
> > > > +        siz2 = readlinkat(lo->proc_self_fd, procname, buf2, sizeof(buf2));
> > > > +
> > > > +        /* disable close on unlink if alias is detected */
> > > > +        if (siz1 != siz2 || memcmp(buf1, buf2, siz1))
> > > > +            g_atomic_int_inc(&inode->opencount);
> > > > +
> > >
> > > Hi Miklos,
> > >
> > > So if I have a hard links to a file in a separate directories, then this
> > > path can hit. (Say dir1/file.txt and dir2/file-link.txt). IIUC, we will
> > > disable automatic inode->fd closing on these hardlinked files. That
> > > means this solution will not solve problem at hand for hard linked files.
> > > Am I missing something.
> >
> > You are correct, this is a limited solution.  The one above is a
> > corner case, and probably not worth worrying about too much.  However
> > it  can be fixed by having separate O_PATH fd's open for each alias
> > (or at least for each alias that resides in a distinct directory).
> >
> > NB: doing a synchronous FORGET on unlink does not solve this case
> > either, since the inode will not receive a FORGET until the last alias
> > is dropped.
> >
> > > >          close(newfd);
> > > >      } else {
> > > >          inode = calloc(1, sizeof(struct lo_inode));
> > > > @@ -917,6 +934,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
> > > >           */
> > > >          g_atomic_int_set(&inode->refcount, 2);
> > > >
> > > > +        g_atomic_int_set(&inode->opencount, 0);
> > > >          inode->nlookup = 1;
> > > >          inode->fd = newfd;
> > > >          inode->key.ino = e->attr.st_ino;
> > > > @@ -1295,6 +1313,10 @@ static void lo_unlink(fuse_req_t req, fuse_ino_t parent, const char *name)
> > > >      res = unlinkat(lo_fd(req, parent), name, 0);
> > > >
> > > >      fuse_reply_err(req, res == -1 ? errno : 0);
> > > > +    if (!g_atomic_int_get(&inode->opencount)) {
> > > > +        close(inode->fd);
> > > > +        inode->fd = -1;
> > > > +    }
> > >
> > > Can this be racy w.r.t lo_lookup(). IOW, say dir1/file.txt is being
> > > unlinked and we closed inode->fd. And before we could execute
> > > unref_inode_lolocked(), another parallel lookup of dir2/file-link.txt
> > > happens if it gets lo->muxtex lock first, it can still find this
> > > inode and bump up reference count. And that means lo_unlink() will
> > > not free inode (but close inode->fd) and now we will use an inode
> > > with closed O_PATH fd which lead to other failures later.
> >
> > Yes, that's a bug.   Also needs serialization against all access to inode->fd.
> >
> 
> Miklos,
> 
> I would like to point out that this discussion is relevant to any low level
> fuse filesystem, but especially those that are proxying a real filesystem.
> 
> I have raised this question before in the FUSE_PASSTHROUGH threads.
> There is a lot of code duplication among various low-level fuse projects and
> as we get to NFS export support and complex issues like this one, is it
> getting unlikely that all projects will handle this correctly.
> 
> Do you think there is room for some more generic code in libfuse and if
> so how? Following an example implementation (assuming it gets fixed)
> and hand picking fixes to projects cannot scale for long.
> 
> The challenge is that most of the generic code would be in lookup and
> maintaining the internal inode cache, but each filesystem may need
> different representations of the Inode object.
> 
> I was thinking of something along the lines of an OO library that
> implements the generic lookup/inode cache for a base FuseInode class
> that implementers can inherit from.
> 
> This doesn't need to be in the libfuse project at all.
> Seeing the virtiofsd already has a Rust implementation that is BSD
> licensed, maybe that can be used as a starting point?
> 
> David,
> 
> How hard do you think it would be to re-factor virtiofsd-rs to
> an implementation that inherits from a base passthroughfsd-rs?
> 
> BTW, is virtiofsd-rs the offical virtiofsd or an experimental one?
> Which tree does Miklos' patch apply to?

The C version is the current main one, but the Rust version is catching
up fast and I hope it will displace the C one sometime this year.
Sergio owns the Rust version, cc'd (slp@ )

> Anyone has other thoughts about how to reduce fragmentation in
> implementations?
> 
> Thanks,
> Amir.

Dave

-- 
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK




More information about the Virtio-fs mailing list