[Virtio-fs] [RFC PATCH 0/7] Inotify support in FUSE and virtiofs

Amir Goldstein amir73il at gmail.com
Wed Dec 15 19:54:38 UTC 2021


On Wed, Dec 15, 2021 at 9:20 PM Vivek Goyal <vgoyal at redhat.com> wrote:
>
> On Wed, Dec 15, 2021 at 07:29:29PM +0200, Amir Goldstein wrote:
> > > >
> > > > The mistake in your premise at 1) is to state that "fuse does not
> > > > support persistent file handles"
> > > > without looking into what that statement means.
> > > > What it really means is that user cannot always open_by_handle_at()
> > > > from a previously
> > > > obtained file handle, which has obvious impact on exporting fuse to NFS (*).
> > >
> > > Hi Amir,
> > >
> > > What good is file handle if one can't use it for open_by_handle_at(). I
> > > mean, are there other use cases?
> >
> > commit 44d705b0:
> > "...There are several ways that an application can use this information:
> >
> >     1. When watching a single directory, the name is always relative to
> >     the watched directory, so application need to fstatat(2) the name
> >     relative to the watched directory.
> >
> >     2. When watching a set of directories, the application could keep a map
> >     of dirfd for all watched directories and hash the map by fid obtained
> >     with name_to_handle_at(2).  When getting a name event, the fid in the
> >     event info could be used to lookup the base dirfd in the map and then
> >     call fstatat(2) with that dirfd.
>
> Ok, so case 1 and 2 still might be doable.
>
> >
> >     3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
> >     directories, the application could use open_by_handle_at(2) with the fid
> >     in event info to obtain dirfd for the directory where event happened and
> >     call fstatat(2) with this dirfd.
> >
> >     The last option scales better for a large number of watched directories.
> >     The first two options may be available in the future also for non
> >     privileged fanotify watchers, because open_by_handle_at(2) requires
> >     the CAP_DAC_READ_SEARCH capability.
> > "
>
> This is one is not possible as it needs open_by_handle_at().
>
> >
> > fsnotifywait [1] has an example of use case #2.
> > Essentially, when watching inodes, the fanotify file identifier is not very much
> > different from the inotify "watch descriptor" - it identifies the watched object
> > and the watched object is pinned to cache as long as the inode mark is set
> > so file handle would not change also in fuse.
>
> Ok, so if we are maintaining a hash map keyed by file handle, then first
> we need to pin down the inode and then call name_to_handle_at() for the
> watched object and add to hash table. Something like this.
>
> A. foo_fd = open(foo.txt)
> B. name_to_handle_at(.., foo.txt,...)
> C. Add info in hash table using foo_handle as key.
> D. Add watch on foo.txt (fanotify_mark()).
> E. close(foo_fd).
>
> One could probably skip step A and E. And do this instead.
>
> A. Add watch on foo.txt (fanotify_mark())
> B. name_to_handle_at(.., foo.txt,...)
> C. Add info in hash table using foo_handle as key.
>
> But this is little bit racy. You might start getting events with file
> handles of foo.txt before you could complete B or C.
>

I suppose you can also use O_PATH fd to name_to_handle_at()
with AT_EMPTY_PATH if you are concerned with races, but the
races of name_to_handle_at() vs read events are pretty easy to handle
in userspace.

> >
> > [1] https://github.com/inotify-tools/inotify-tools/pull/134
> >
> > >
> > > IIUC, file handle for the same object can change if inode had been flushed
> > > out of guest cache and brought back in later. So if application say
> > > generated file handle for an object and saved it and later put a watch
> > > on that object, by that time file handle of the object might have changed
> > > (as seen by fuse). So one can't even use to match it with previous saved
> > > file handle.
> > >
> >
> > The argument is not applicable for inode watches.
>
> Fair enough. I could see a very limited use case and thought that's not
> enough. But looks like you seem to be ok with that.
>
> > Filesystem and mount watches are not going to be supported with virtiofs
> > or any filesystem that does not support persistent file handles.
>
> Ok, so no filesystem and mount watches for virtiofs to begin with.
>

No, but I do expect the remote fsnotify vfs API design to take those
into account as future extensions.

> >
> > > So I can't use file handle for open_by_handle_at(). I can't use it to
> > > match it with previously saved file handle. So what can I use it for?
> > >
> > > IOW, I could not imagine supporting fanotify file handles without
> > > fixing the file handles properly in fuse. And it needs fixing in
> > > virtiofs as well as we can't trust random file handles from guest
> > > for regular files.
> > >
> >
> > Partly correct statements, but when looking at the details, they are
> > not relevant to the case of fanotify inode watch.
> >
> > Note that at the moment, fuse does not even support local fanotify
> > watch with file handles because of fanotify_test_fsid() - fuse does
> > not set f_fsid (not s_uuid), so it's not really about supporting fanotify
> > on fuse now.
>
> Hmm..., that means we first will have to look into supporting local
> fanotify events with file handles on fuse. Without that we can't even
> test our remote fsnotify changes looks like.
>
> This sounds like another blocker (or dependency project to complete first)
> before one can make progress with remote inotify/fanotify/fsnotify.
>

I am not saying you need to do any of those things, but you need to
take into account that someone else will want to implement them
in virtiofs, other fuse server or other remote fs.

All I am asking is that the vfs API and to some extent also the FUSE
fsnotify protocol extension will not be limited to inotify terminology.

> > It's about the vfs APIs for remote fsnotify that should not be inotify
> > specific.
>
> I understand that part. But at the same time, remote fsnotify API will
> probably evolve as you keep on adding more functionality. What if there
> is another notification mechanism tomorrow say newfancynotify(), we
> might have to modify remote fsnoitfy again to accomodate that.
>
> IOW, fsnotify seems to be just underlying plumbing and whatever you
> add today might not be enough to support tomorrow's features. That's
> why I wanted to start with a minimal set of functionality and add
> more to it later.
>

I do want to start with minimal functionality.
I did not request that you implement more functionality than what inotify
provides.

TBH, I can't even remember the specific details that made me say
"this is remote inotify not remote fsnotify", but there were such details.
I remember inotify rename cookie being one of them.

I guess this discussion is not very productive at this point as none of us
are saying anything very specific about what should and should not
be done, so let me try to suggest something -

Try to see if you could replace the server side implementation with
fanotify even if you use CAP_SYS_ADMIN for the experiment.
fanotify should be almost a drop-in replacement for inotify at this point
If you think that you cannot make this experiment with your current
protocol and vfs extensions then you must have done something wrong
and tied the protocol and/or the vfs API to inotify terminology.

Thanks,
Amir.




More information about the Virtio-fs mailing list