[Virtio-fs] [PATCH] virtiofs: Enable SB_NOSEC flag to improve small write performance

Miklos Szeredi miklos at szeredi.hu
Wed Jul 22 10:00:55 UTC 2020


On Tue, Jul 21, 2020 at 11:30 PM Vivek Goyal <vgoyal at redhat.com> wrote:

> What about following race. Assume a client has set suid/sgid/caps
> and this client is doing write but cache metadata has not expired
> yet. That means fuse_update_attributes() will not clear S_NOSEC

If the write is done on the same client as the chmod/setxattr, than
that's not a problem since chmod and setxattr will clear S_NOSEC in
the suid/sgid/caps case.

> and that means file_remove_privs() will not clear suid/sgid/caps
> as well as WRITE will be buffered so that also will not clear
> suid/sgid/caps as well.
>
> IOW, even after WRITE has completed, suid/sgid/security.capability will
> still be there on file inode if inode metadata had not expired at the time
> of WRITE. Is that acceptable from coherency requirements point of view.


If they were on different clients, then yes, we'll have a coherency
issue, but that's not new.

>
> I have a question. Does general fuse allow this use case where multiple
> clients are distributed and not going through same VFS? virtiofs wants
> to support that at some point of time but what about existing fuse
> filesystems.

Fuse supports distributed fs through

 - dcache invalidation with timeout (can be zero to disable caching completely)
 - dcache invalidation with notification
 - metadata invalidation with timeout (can be zero to disable caching
completely)
 - metadata and data range invalidation with notification
 - disabling data caching (FOPEN_DIRECT_IO)
 - auto data invalidation on size/mtime change
 - data invalidation on open (!FOPEN_KEEP_CACHE)

So yes, supports a number of ways to handle the multiple client case.

What we could do additionally with virtiofs is to make clients running
in different guests but on the same host achieve strong coherency with
some sort of shared memory scheme.

> I also have concerns with being dependent on FUSE_HANDLE_KILLPRIV because
> it clear suid/sgid on WRITE and truncate evn if caller has
> CAP_SETID, breaking Linux behavior (Don't know what does POSIX say).

FUSE_HANDLE_KILLPRIV should handle writes properly (discovered by one
of the test suites, not real life report, as far as I remember).

Truncate is a different matter, currently there's no way to
distinguish between CAP_FSETID being set or clear.

Can't find POSIX saying anything about this.

> Shall we design FUSE_HANDLE_KILLPRIV2 instead which kills
> security.capability always but kills suid/sgid on on WRITE/truncate
> only if caller does not have CAP_FSETID. This also means that
> we probably will have to send this information in fuse_setattr()
> somehow.

Yes, that's a good plan.

Thanks,
Miklos




More information about the Virtio-fs mailing list