[libvirt] AMD SEV's /dev/sev permissions and probing QEMU for capabilities

Daniel P. Berrangé berrange at redhat.com
Tue Jan 29 18:40:08 UTC 2019


On Tue, Jan 29, 2019 at 05:15:42PM +0100, Erik Skultety wrote:
> On Wed, Jan 23, 2019 at 03:02:28PM +0000, Singh, Brijesh wrote:
> >
> >
> > On 1/23/19 7:36 AM, Daniel P. Berrangé wrote:
> > > On Wed, Jan 23, 2019 at 02:33:01PM +0100, Erik Skultety wrote:
> > >> On Wed, Jan 23, 2019 at 01:24:13PM +0000, Daniel P. Berrangé wrote:
> > >>> On Wed, Jan 23, 2019 at 02:22:12PM +0100, Erik Skultety wrote:
> > >>>> On Wed, Jan 23, 2019 at 01:10:42PM +0000, Daniel P. Berrangé wrote:
> > >>>>>
> > >>>>> It still looks like we can solve this entirely in libvirt by just giving
> > >>>>> the libvirt capabilities probing code CAP_DAC_OVERRIDE. This would make
> > >>>>> libvirt work for all currently released SEV support in kernel/qemu.
> > >>>>
> > >>>> Sure we can, but that would make libvirt the only legitimate user of /dev/sev
> > >>>> and everything else would require the admin to change the permissions
> > >>>> explicitly so that other apps could at least retrieve the platform info, if
> > >>>> it was intended to be for public use?
> > >>>> Additionally, we'll still get shot down by SELinux because svirt_t wouldn't be
> > >>>> able to access /dev/sev by default.
> > >>>
> > >>> That's separate form probing and just needs SELinux policy to define
> > >>> a new  sev_device_t type and grant svirt_t access to it.
> > >>
> > >> I know, I misread "we can solve this entirely in libvirt" then, I thought you
> > >> the SELinux part was included in the statement, my bad then. Still, back to the
> > >> original issue, we could technically do both, libvirt would have run qemu with
> > >> CAP_DAC_OVERRIDE and we keep working with everything's been released for
> > >> SEV in kernel/qemu and for everyone else, systemd might add 0644 for /dev/sev,
> > >> that way, everyone's happy, not that I'd be a fan of libvirt often having
> > >> to work around something because projects underneath wouldn't backport fixes to
> > >> all the distros we support, thus leaving the dirty work to us.
> > >
> > > Setting 0644 for /dev/sev looks unsafe to me unless someone can show
> > > where the permissions checks take place for the many ioctls that
> > > /dev/sev allows, such that only SEV_PDH_CERT_EXPORT or SEV_PLATFORM_STATUS
> > > is allowed when /dev/sev is opened by a user who doesn't have write
> > > permissions.
> > >
> >
> > Agree its not safe to do 0644.
> >
> > Currently, anyone who has access to /dev/sev (read or write) will be
> > able to execute SEV platform command. In other words there is no
> > permission check per command basis. I must admit that while developing
> > the driver I was under assumption that only root will ever access the
> > /dev/sev device hence overlooked it. But now knowing that others may
> > also need to access the /dev/sev, I can submit patch in kernel to do
> > per command access control.
> >
> > Until then, can we follow Daniel's recommendation to elevate privilege
> > of the probing code?
> 
> So, sorry for not coming back earlier, but I'm still fighting a permission
> issue when opening the /dev/sev device and I honestly don't know what's
> happening. If you apply the patch bellow (or attachment) and you run libvirtd
> with LIBVIRT_LOG_OUTPUTS=1:file:<your_log_file> env, you should see something
> like this in the logs:
> 
> warning : virExec:778 : INHERITABLE CAPS: 'dac_override'
> warning : virExec:781 : EFFECTIVE CAPS: 'dac_override'
> warning : virExec:784 : PERMITTED CAPS: 'dac_override'
> warning : virExec:787 : BOUNDING CAPS: 'dac_override'
> 
> ...and that is right before we issue execve. Yet, if you debug further into the
> QEMU process right after execve, it doesn't report any capabilities at all, so
> naturally it'll still get permission denied. Is there something I'm missing
> here?

Arrrrggghh.  We're missing the painful thing that I re-learn the hard way
every time we play with capabilities in libvirt.....

*Even* if you have set INHERITABLE capabilities, they are discarded
on execve(), unless the binary you are exec'ing has file capabilities
set that match those you are trying to give it. Of course QEMU binaries
(and almost all binaries) lack such file capabilities.

The capabilities man page does in fact explain this

[quote]
        P’(permitted) = (P(inheritable) & F(inheritable)) |
                        (F(permitted) & cap_bset)

        P’(effective) = F(effective) ? P’(permitted) : 0

where

        P         denotes the value of a thread capability set before the execve(2)

        P'        denotes the value of a thread capability set after the execve(2)

        F         denotes a file capability set

If ... real user ID of the process is 0 (root) then the file inheritable
and permitted sets are defined to be all ones
[/quote]

So this means your code works perfectly *if*  we are executing
QEMU as root, but if we are executing QEMU as non-root then
all our work is for nothing.

Now I have actually intentionally mis-quoted from an old man
page here. In Linux 4.3 the kernel devs invented a new set of
"ambient" capabilities [1] which allow programs to deal with this
insanely annoying behaviour.

[quote]
        P'(ambient)     = (file is privileged) ? 0 : P(ambient)

        P'(permitted)   = (P(inheritable) & F(inheritable)) |
                          (F(permitted) & cap_bset) | P'(ambient)

        P'(effective)   = F(effective) ? P'(permitted) : P'(ambient)
[/quote]

so if we added CAP_DAC_OVERRIDE to the ambient capabilities set,
then your patch would work.

We can do that with this patch:

diff --git a/src/util/virutil.c b/src/util/virutil.c
index 5251b66454..be83e91fee 100644
--- a/src/util/virutil.c
+++ b/src/util/virutil.c
@@ -1601,6 +1601,12 @@ virSetUIDGIDWithCaps(uid_t uid, gid_t gid, gid_t *groups, int ngroups,
         goto cleanup;
     }
 
+    for (i = 0; i <= CAP_LAST_CAP; i++) {
+        if (capBits & (1ULL << i)) {
+            prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, i, 0, 0);
+        }
+    }
+
     ret = 0;
  cleanup:
     return ret;


though, we need a #ifdef check for existance of PR_CAP_AMBIENT 

> An alternative question I've been playing ever since we exchanged the last few
> emails is that can't we wait until the ioctls are compared against permissions
> in kernel so that upstream libvirt (and downstream too for that matter) doesn't
> have to work around it and stick with that workaround for eternity?

IIUC, the SEV feature has already shipped with distros, so we'd effectively
be saying that what we already shipped is unusable to libvirt. This doesn't
feel like a desirable story to me.


Regards,
Daniel

[1] https://s3hh.wordpress.com/2015/07/25/ambient-capabilities/
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list