[Virtio-fs] [RFC PATCH 0/7] Inotify support in FUSE and virtiofs

Ioannis Angelakopoulos iangelak at redhat.com
Tue Dec 14 23:21:43 UTC 2021


Hello Amir and Jan,

After testing some of your proposals, related to extending the remote
notification to fanotify as well, we came across some issues that are not
straightforward to overcome:

1) Currently fuse does not support persistent file handles.  This means
that file handles become stale if an inode is flushed out of the cache. The
file handle support is very limited at the moment in fuse. Thus, the only
option left is to implement fanotify both in server and guest with file
descriptors.

2) Since we can only use file descriptors, to support fanotify in viritofsd
we need CAP_SYS_ADMIN enabled. The virtiofs developers are not
very positive about the idea of using CAP_SYS_ADMIN for security reasons.
Thus we attempted to support some basic fanotify functionality on the
client/guest by modifying our existing implementation with inotify/fsnotify.

3) Basically, we continue to use inotify on the virtiofsd/fuse server but
we add support on the client/guest kernel to be able to support simple
fanotify events (i.e., for now the same events as inotify). However, two
important problems arise from the use of the fanotify file descriptor mode
in a guest process:

3a) First, to be able to support fanotify in the file descriptor mode we
need to pass to a "struct path" to the fsnotify call (within the guest
kernel) that corresponds to the inode that we are monitoring.
Unfortunately, when the guest receives the remote event from the server it
only has information about the target inode. Since there is more than one
mapping of "struct path" to a "struct inode" we do not know which path
information to pass to the fsnotify call.

3b) Second, since the guest kernel needs to pass an open file descriptor
back to the guest user space as part of the fanotify event data, internally
the guest kernel (through fanotify's "create_fd" function) issues a
"dentry_open" which will result in an additional FUSE_OPEN call to the
server and subsequently the generation of an open event on the server (If
the server monitors for an open event). This will inevitably cause an
infinite loop of FUSE_OPEN requests and generation of open events on the
server. One idea was to modify the open syscall (on the host kernel) to
allow the use of FMODE_NONOTIFY flag from user space (currently it is used
internally in the kernel code only), to be able to suppress open events.
However, a malicious guest might be able to exploit that flag to disrupt
the event generation for a file (I am not entirely sure if this is
possible, yet).

To sum up, it seems that the support for fanotify causes some problems that
are very difficult to mitigate at the moment. The fanotify file handles
mode would probably solve most if not all of the above problems we are
facing, however as Vivek pointed out the file handle support in
virtiofs/fuse is another project altogether.

So we would like to ask you for any suggestions related to the
aforementioned problems. If there are no "easy" solutions in sight for
these fanotify issues, we would like to at least continue to support the
remote inotify in the next version of the patches and try to solve issues
around it.

Thanks,
Ioannis

On Tue, Nov 30, 2021 at 10:27 AM Vivek Goyal <vgoyal at redhat.com> wrote:

> On Wed, Nov 17, 2021 at 08:40:57AM +0200, Amir Goldstein wrote:
> > On Wed, Nov 17, 2021 at 12:12 AM Ioannis Angelakopoulos
> > <iangelak at redhat.com> wrote:
> > >
> > >
> > >
> > > On Tue, Nov 16, 2021 at 12:10 AM Stef Bon <stefbon at gmail.com> wrote:
> > >>
> > >> Hi Ioannis,
> > >>
> > >> I see that you have been working on making fsnotify work on virtiofs.
> > >> Earlier you contacted me since I've written this:
> > >>
> > >> https://github.com/libfuse/libfuse/wiki/Fsnotify-and-FUSE
> > >>
> > >> and send you my patches on 23 june.
> > >> I want to mention first that I would have appreciated it if you would
> > >> have reacted to me after I've sent you my patches. I did not get any
> > >> reaction from you. Maybe these patches (which differ from what you
> > >> propose now, but there is also a lot in common) have been an
> > >> inspiration for you.
> > >>
> > >> Second, what I've written about is that with network filesystems (eg a
> > >> backend shared with other systems) fsnotify support in FUSE has some
> > >> drawbacks.
> > >> In a network environment, where a network fs is part of making people
> > >> collaborate, it's very useful to have information on who did what on
> > >> which host, and also when. Simply a message "a file has been created
> > >> in the folder you watch" is not enough. For example, if you are part
> > >> of a team, and assigned to your team is a directory on a server where
> > >> you can work on some shared documents. Now in this example there is a
> > >> planning, and some documents have to be written. In that case you want
> > >> to be informed that someone in your team has started a document (by
> > >> creating it) by the system.
> > >>
> > > I agree that the out of band approach you propose is actually more
> powerful, since it can
> > > provide the client with more information than remote fsnotify.
> However, in the virtiofs setup
> > > your approach might not be as efficient.
> > >
> > > Specifically, the information on who did what might not make sense to
> the guest within QEMU, since all the
> > > virtiofs filesystem operations are handled by viritofsd on the host
> and the guest does not know about the
> > > server or any other guests. Vivek, correct me here if I am wrong.
> > >
> > > Thus, for now at least, it might be sufficient for the guest to know
> that just a remote event
> > > occurred.
> > >
> > >> This "extended" information will never get through fsnotify.
> > >>
> > >> Other info useful to you as team member:
> > >>
> > >> -  you have become member of another team:
> sbon at anotherteam.example.org
> > >> -  diskspace and/or quota shortage reported by networksystem
> > >> -  new teammember, teammember left
> > >> -  your "rights" or role in the network/team have been changed (for
> > >> example from reader to reader and writer to some documents)
> > >>
> > >> What I want to say is that in a network where lots of people work
> > >> together in teams/projects, (and I want Linux to play a role there, as
> > >> desktop/workstation) communication is very important, and all these
> > >> messages should be supported by the system. My idea is the support of
> > >> watching fs events with FUSE filesystems should go through userspace,
> > >> and not via the kernel (cause fs events are part of your setup in the
> > >> network, together with all other tools to make people collaborate like
> > >> chat/call/text, and because mentioned above extended info about the
> > >> who on what host etc is not supported by fsnotify).
> > >> There should be a fs event watcher which takes care of all watches on
> > >> behalf of applications during a session, similar to gamin and FAM once
> > >> did (not used anymore?).
> > >> When receiving a request from one of the applications this fsevent
> > >> watcher will use inotify and/or fanotify for local fs's only. With a
> > >> FUSE fs, it should contact (via a socket) this fs that a watch has
> > >> been set on an inode with a certain mask.
> > >> If the FUSE fs does not support this, fallback on normal
> inotify/fanotify.
> > >> This way extended info is possible.
> > >>
> > >> Is this extended information also useful for virtiofs?
> > >>
> > > Also based on your explanation, your out of band approach is specific
> to FUSE filesystems.
> > > Granted, with your approach there is less complexity in the kernel and
> more flexibility since
> > > the event notification occurs solely in user-space.
> > > However, during the discussion with Amir and Jan about potential
> routes we could take to support the remote
> > > fanotify/inotify/fsnotify one important concern was that the API
> should be able to support other
> > > network/remote filesystems if needed and not only FUSE filesystems.
> > > It seems that your approach would require a lot of work (correct me if
> I am wrong) to be adopted
> > > by other network filesystems.
> > >
> > > Finally, user-space applications should also be aware of your new API,
> which will probably result in a non-negligible effort by app developers to
> adopt it or change their existing apps. The remote inotify/fsnotify ( the
> current implementation) even though it has many limitations, relies on the
> existing API and should require less modifications in user space apps. That
> is why we chose the remote inotify/fsnotify route.
> > >
> >
> > The way you depict the options seems like either applications are not
> > aware of the UAPI
> > changes or they need to be modified to adapt to the changes.
> >
> > I actually think that the much better approach would be to deal with
> > most of the UAPI
> > complexity in a library, so applications may need to be rebuilt or
> > adapted to use a new
> > library, but going forward, the library would abstract most of the
> > complexity from the
> > applications.
> >
> > The holy grail would be a portable library, such as this go library [1].
> > It is quite hard to design an API that would abstract all the
> > different capabilities
> > on Linux-inotify/Linux-fanotify/MacOS-fsevents/Win-USNJournal and more.
> >
> > The second best would be a library to abstract the ever growing
> complexity
> > of Linux inotify/fanotify UAPI from applications.
> > I had already made the first step with adapting libinotifytools to
> fanotify [2].
> > We could continue down that path or start creating/improving a
> > different library.
> >
> > The point is that abstracting different capabilities of remote fs
> notifications
> > (i.e. cifs, virtiofs, generic FUSE) is going to be challenging, so
> starting the
> > design from a user library API and deriving the needed pieces from the
> > kernel UAPI is the right way to go IMO.
> >
> > The library approach will have the advantage that some remote fs
> capabilities
> > (e.g. for cifs) will be available for old kernels as well and in
> > theory, the library
> > could also use some standardized OOB notifications channel to get changed
> > made on the host from VM guest tools as Stef proposed.
> >
> > > That said, I do not see a reason why both implementations cannot
> co-exist
> > > and have the user-space applications choose which approach they want.
> > >
> >
> > True, an OOB channel and kernel generic remote fs support can both exist,
> > but it would be best if the application was not aware of either.
> > The library would pick the facility based on the requested functionality,
> > availability of the facilities in the filesystem and sysadmin/user
> policy.
>
> Sure. A library will be nice to abstract kernel generic remote fs and OOB
> channel. This sounds like another project for somebody to work on.
>
> IIUC, it looks like that both the approaches can make progress in
> parallel. Advantange of OOB channel approach seems that it will be easier
> to
> make changes in user space and add more events and send extra information.
>
> While advantage of remote fsnotify is that its a known existing events
> API and it makes it easier for applications to use same API for remote
> fs (however limited it might be).
>
> Thanks
> Vivek
>
> > Thanks,
> > Amir.
> >
> > [1] https://github.com/fsnotify/fsnotify
> > [2] https://github.com/inotify-tools/inotify-tools/pull/134
> >
>
>

-- 
Ioannis Angelakopoulos
Software Engineer Intern at Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/virtio-fs/attachments/20211214/6655158d/attachment.htm>


More information about the Virtio-fs mailing list