[Virtio-fs] [PATCH 0/2] virtiofsd: drop Linux capabilities(7)
Dr. David Alan Gilbert
dgilbert at redhat.com
Fri Jun 19 16:16:48 UTC 2020
* Vivek Goyal (vgoyal at redhat.com) wrote:
> On Fri, Jun 19, 2020 at 09:27:46AM +0100, Dr. David Alan Gilbert wrote:
> > * Vivek Goyal (vgoyal at redhat.com) wrote:
> > > On Thu, Jun 18, 2020 at 08:16:55PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Vivek Goyal (vgoyal at redhat.com) wrote:
> > > > > On Thu, Apr 16, 2020 at 05:49:05PM +0100, Stefan Hajnoczi wrote:
> > > > > > virtiofsd doesn't need of all Linux capabilities(7) available to root. Keep a
> > > > > > whitelisted set of capabilities that we require. This improves security in
> > > > > > case virtiofsd is compromised by making it hard for an attacker to gain further
> > > > > > access to the system.
> > > > >
> > > > > Hi Stefan,
> > > > >
> > > > > I just noticed that this patch set breaks overlayfs on top of virtiofs.
> > > > >
> > > > > overlayfs sets "trusted.overlay.*" and xattrs in trusted domain
> > > > > need CAP_SYS_ADMIN.
> > > > >
> > > > > man xattr says.
> > > > >
> > > > > Trusted extended attributes
> > > > > Trusted extended attributes are visible and accessible only to pro‐
> > > > > cesses that have the CAP_SYS_ADMIN capability. Attributes in this
> > > > > class are used to implement mechanisms in user space (i.e., outside the
> > > > > kernel) which keep information in extended attributes to which ordinary
> > > > > processes should not have access.
> > > > >
> > > > > There is a chance that overlay moves away from trusted xattr in future.
> > > > > But for now we need to make it work. This is an important use case for
> > > > > kata docker in docker build.
> > > > >
> > > > > May be we can add an option to virtiofsd say "--add-cap <capability>" and
> > > > > ask user to pass in "--add-cap cap_sys_admin" if they need to run daemon
> > > > > with this capaibility.
> > > >
> > > > I'll admit I don't like the idea of giving it cap_sys_admin.
> > > > Can you explain:
> > > > a) What overlayfs uses trusted for?
> > >
> > > overlayfs stores bunch of metadata and uses "trusted" xattrs for it.
> >
> > Tell me more about this metadata.
> > Taking a juicy looking one, what does OVL_XATTR_REDIRECT do?
>
> It contains path information which is used for lookup into lower layer.
>
> > Or what happens if I was to write random numbers into OVL_XATTR_NLINK?
>
> Overlay is storing its metadata in trusted.* xattrs. If a user modifies
> metadata, then various kind of bad things can happen. I think one can
> do some kind of checks on metadata to make sure it does not crash
> atleast.
>
> And that's true for any filesystem. Isn't. If user manages to modify
> metadata outside of filesystem, then lot of bad things can happen. I
> thought that's the reason that people are not comfortable with the
> idea of allowing mounting filesystem from inside user namespace because
> it makes it easy to mount a hand crafted filesystem.
>
> Anyway, I think overlayfs is just one use case of trusted xattr. Even
> if overlayfs moves away from trusted xattr, what about other users.
> We need to have a story around how will we support trusted xattrs
> safely.
>
>
> >
> > > > b) If something nasty was to write junk into the trusted attributes,
> > > > what would happen?
> > >
> > > This directory is owned by guest. So it should be able to write
> > > anything it wants, as long as process in guest has CAP_SYS_ADMIN, right?
> >
> > Well, we shouldn't be able to break/crash/escape into the host; how
> > much does overlayfs validate trusted.* it uses?
>
> I thought qemu and kvm are the one who should ensure we should not be
> able to break out of sandbox. Kernel implementation could be as
> buggy as it wanted to be. We are working with this security model
> that kernel is completely untrusted.
But with virtiofs we allow the guest to do a lot of filesystem
operations on the host. It's the virtiofsd that has to ensure that
these are safe and contained within the fs it's exposed; the qemu/kvm
can't protect us from that.
That's why we sandbox the virtiofsd like we do - if we allow a
priviliged guest to perform calls to an unconstrained virtiofsd it would
be able to escape. That's what I want to check.
Dave
> >
> > > > c) I see overlayfs has a fallback check if xattr isn't supported at
> > > > all - what is the consequence?
> > >
> > > It falls back to I think read only mode.
> >
> > It looks like the fallback is more subtle to me:
> > /*
> > * Check if upper/work fs supports trusted.overlay.* xattr
> > */
> > err = ovl_do_setxattr(ofs->workdir, OVL_XATTR_OPAQUE, "0", 1, 0);
> > if (err) {
> > ofs->noxattr = true;
> > ofs->config.index = false;
> > ofs->config.metacopy = false;
> > pr_warn("upper fs does not support xattr, falling back to index=off and metacopy=off.\n");
> >
> > but I don't know what index and metacopy are.
>
> They enable certain features in overlayfs. In fact, we fall back to
> lesser capability on if we are running on ext4/xfs. For virtiofs,
> we deny the mount completely.
>
> /*
> * We allowed sub-optimal upper fs configuration and don't want to break
> * users over kernel upgrade, but we never allowed remote upper fs, so
> * we can enforce strict requirements for remote upper fs.
> */
> if (ovl_dentry_remote(ofs->workdir) &&
> (!d_type || !rename_whiteout || ofs->noxattr)) {
> pr_err("upper fs missing required features.\n");
> err = -EINVAL;
> goto out;
> }
>
> >
> > > For a moment forget about overlayfs. Say a user process in guest with
> > > CAP_SYS_ADMIN is writing trusted.foo. Should that succeed? Is a
> > > passthrough filesystem, so it should go through. But currently it
> > > wont.
> >
> > As long as any effects of what it writes are contained to the area of
> > the filesystem exposed to the guest, yes - however it worries me what
> > the consequences of broken trusted metadata is. If it's delicate enough
> > that it's guarded by CAP_SYS_ADMIN someone must have worried about it.
>
> Agreed that we need to look into whether having CAP_SYS_ADMIN allow
> virtiofsd to break out of jail.
>
> May be we need to provide that remapping trusted xattr feature so
> that we don't have to have CAP_SYS_ADMIN in init_user_ns and can
> provide this emulation even when running in user namespace.
>
> Vivek
--
Dr. David Alan Gilbert / dgilbert at redhat.com / Manchester, UK
More information about the Virtio-fs
mailing list