[Virtio-fs] Current file handle status and open questions

Wed Mar 24 16:54:33 UTC 2021

* Max Reitz (mreitz at redhat.com) wrote:
> Hi,
> 
> As threatened in our last meeting, I’ve written this mail to give an
> overview on where we stand with regards to virtiofsd(-rs) using file
> handles.
> 
> Technically, this should be a reply to the “Securint file handles”
> thread, but this mail is so long I think it’s better to split it off.
> 
> There are multiple problems that somehow relate to file handles, the
> ones I’m aware of are:
> 
> (A) We have a problem with too many FDs open.  To solve it, we could
>     attach a file handle to each node, then close the FD (as far as
>     possible) and reopen it when needed from the file handle.
> 
> (B) We want to allow the guest to use persistent file handles.
> 
> (C) For live migration, the problem isn’t even clear yet, but it seems
>     like we’ll want to translate nodes into their file handles and
>     transmit those and open them again on the destination (at least on
>     shared filesystems).

> Now every case has its own specific tricky bits:
> 
> Case (A) is something that we’d really like to have by default, and it
> would need to work all the time during runtime.  So the problem here is
> that we’d need CAP_DAC_READ_SEARCH, and we’d need it all the time, but
> we don’t want that.  One interesting bit is that we don’t need these
> file handles to be persistent between virtiofsd invocations.
> 
> For (B) we have the problem of needing to protect against a potentially
> malicious guest, i.e. that it must not be able to reference files
> outside the shared directory.  (Perhaps except for cases where the file
> was once reachable, i.e. where a file handle was generated by the guest,
> then the file was moved outside of the shared directory, but remains
> accessible through the file handle.)  Furthermore, file handles should
> really be persistent between virtiofsd invocations.  On the positive
> side, it would be easier for us to require CAP_DAC_READ_SEARCH for this
> case, because it really is optional.  We could require users to give us
> that capability if they want file handles in the guest (and we find no
> way to avoid requiring that capability).
> 
> (C) needs persistency between source and destination, but on the
> positive side, we only need to be able to open file handles during the
> in-migrate phase on the destination.  So requiring CAP_DAC_READ_SEARCH
> only during that phase might not be that big of a deal.

(B) & (C) interact in some fun ways; if we ignore that then we get:

  a) handle has to be readnble by same host kernel; so a MAC with a
local kernel known key works
  b) Same as (a)?
  c) Now we need a way to transfer the handle to a destination, where
the destination 1) Trusts that it was generated by another trustworthy
host  2) That it hasn't been tampered with

But if you allow the guest persistent file handles (b), and then migrate
it (c), that guest visibile file handle has to work on the destination;
even worse it has to work on some future destination even if
intermediate destinations haven't ever used that handle.

That makes it quite tricky, because one way to have solved (c) alone
would have been to generate some form of 'transport key' for the
migration, where the destination and source somehow exchange keys (at
the managment layer) and the handles are protected during migration; but
b+c is much harder because you can't use a short term key like that.

Dave

> 
> (Ideally, we’d want all cases to work without CAP_DAC_READ_SEARCH, but
> as you can see, that requirement is weakened for cases (B) and
> especially (C).)
> 
> 
> As far as I’ve understood, (A) is the case that we want to focus on
> first, and the main problem there is that we need to open file handles
> without CAP_DAC_READ_SEARCH.
> 
> To do that, I at one point proposed a service process that has
> CAP_DAC_READ_SEARCH and would open file handles for virtiofsd.  But that
> probably won’t really be an improvement, because this process too would
> probably need to run in the container and so if we can’t give virtiofsd
> that capability, we can’t give it to that service process either.
> 
> What Miklos proposed was to modify the kernel to allow processes to open
> file handles even if they don’t have CAP_DAC_READ_SEARCH as long as
> those files are in the process’s scope.  One way to implement this
> restriction (in a very restrictive manner) is to only allow opening file
> handles that the process has generated before, e.g. by appending a MAC
> to every file handle (generated with a process-specific key) and
> checking that when opening a handle.  (You would request this MAC with a
> new AT_* flag passed to name_to_handle_at().  open_by_handle_at()
> recognizes it due to a special file handle type value.)
> 
> (Process-specific key = stored in current->files, i.e. the files_struct.
> I’m not 100% sure what this is, but I guess this is the structure that
> keeps a process’s open file descriptors, and so should generally be
> unique to a process, or at least unique to a group of processes that
> share the same FDs.)
> 
> 
> So far so good.
> 
> What I’m now worried about is whether we can make use of that kernel
> feature for (B) and (C), too.  I don’t really want to design completely
> independent solutions for those problems.
> 
> I don’t think it’s important that we come up with exactly plotted
> implementations for (B) and (C) now, but a rough sketch on how we might
> approach them would be nice.
> 
> What we need for (B) is persistency between virtiofsd invocations.  For
> (C), we can decide whether we just don’t want to care (i.e., we simply
> require CAP_DAC_READ_SEARCH during the in-migrate phase on the
> destination), or whether we need some way to transfer the MAC key from
> the source to the destination.  (But if we want (B) with live migration,
> we will need to transfer the key to the destination.)
> 
> Let’s talk about (B).
> 
> I can’t imagine a design where the kernel could create persistent keys
> without having a user space process store them.  Well, except something
> where you measure a program image to derive its specific MAC key from
> some persistent platform key, but I don’t think that’s feasible (for
> multiple reasons), and it’d also make it impossible to migrate a VM to a
> different host and keep guest file handles valid.
> 
> So I think we need some way to let a user space process upload a key for
> some other key to use.  Vivek has proposed using keyctl(2) for that.
> I’m not exactly sure how we’d approach that in practice, though...  I
> suppose we’d add the key to some global keyring (write-only), and then
> somehow we’d need to let virtiofsd inform the kernel which key to use
> for its MACs.  (Perhaps we can do the last bit with keyctl_search(), by
> copying a key from the upload keyring to a process-specific keyring,
> perhaps even a keyring that exists only to hold the process’s MAC key?)
> 
> A problem here is that when a process (a) uploads a key for some other
> process (b) to use, you can effectively give (b) CAP_DAC_READ_SEARCH;
> because if (a) then shared the key with (b), (b) could forge all of its
> MACs and thus bypass the MAC protection (so it effectively gains
> CAP_DAC_READ_SEARCH).  To be allowed this, (a) would need a level of
> privilege that’s equivalent to being able to grant arbitrary processes
> CAP_DAC_READ_SEARCH.  Miklos suggested that we might as well fall back
> to requiring to CAP_SYS_ADMIN, and I think that’s reasonable.
> 
> The next question is, how would we require this capability for a key
> upload?  I don’t think keyctl currently has a solution for that.  I
> suppose we would need to have a special global keyring for such MACs and
> only allow privileged processes to upload keys there, so once a process
> asks for a specific key to be used, we can fetch it from that keyring
> and be sure that key has been uploaded by a privileged application.
> 
> Then, there’d be the question of how an application would select its key
> from the upload keyring.  I think we would need to prevent it from
> selecting a key that’s intended for some other application, because then
> it could use that other application’s file handles...  Which I don’t
> think is a problem if that other application willingly shares that
> ability (because in such a case, that other application might as well
> just do I/O on the first application’s behalf), but other applications’
> keys must not be reachable by guessing.  So perhaps a key description of
> high entropy would do (that’s given to keyctl_search()).
> 
> I think Dave also suggested using a certificate-based system, but I’m
> afraid I don’t fully understand it.  As far as I had guessed, that would
> mean the privileged process only configures what CAs are to be trusted,
> and then every process could upload its own MAC key along with a
> certificate from a trusted CA.  That would get around the whole keyring
> stuff, but it means you’d need to be able to provide trusted CAs and
> we’d need to check those certificates in the kernel.  I have no idea how
> we’d do that, I don’t think keyctl could help us there.
> 
> Furthermore, the obvious problem with my interpretation of the
> certificate proposal is that the uploading application would see its own
> key, which would be bad.  I suspect it would need to be encrypted
> somehow, but if we do that, we might as well skip the certificate part
> and just let the privileged process upload one or more global encryption
> keys, right?  (Applications could then upload encrypted keys with nonces
> and a hash, I guess, so the kernel can verify that the key is valid.)
> 
> 
> So, overall it seems like it may be workable to extend the in-kernel MAC
> verification to allow for persistent keys, but I still have some open
> questions, and I don’t know whether I should just defer them until we
> get to the point where we need persistency.
> 
> (Of note: If we implement something where a user space process (or
> multiple in conjunction) can arbitrarily choose a MAC key, then this
> will also be usable for live migration, because you can continue to use
> the key from the source on the destination.)
> 
> 
> Final minor question that doesn’t really fit in fully elsewhere: When
> generating a MAC over a file handle, should the mount ID be included?
> I’m worried that this might definitely break persistency, but I think we
> should include some FS ID.  Maybe the kernel should query the FS UUID
> for this MAC calculation, and use that instead of the mount ID?
> 
> 
> Thanks for reading and all your help,
> 
> Max
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs at redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
-- 
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK