[PATCH RFC 00/10] qemu: Enable SCHED_CORE for domains and helper processes

Mon May 23 16:13:18 UTC 2022

On Mon, May 09, 2022 at 05:02:07PM +0200, Michal Privoznik wrote:
> The Linux kernel offers a way to mitigate side channel attacks on Hyper
> Threads (e.g. MDS and L1TF). Long story short, userspace can define
> groups of processes (aka trusted groups) and only processes within one
> group can run on sibling Hyper Threads. The group membership is
> automatically preserved on fork() and exec().
> 
> Now, there is one scenario which I don't cover in my series and I'd like
> to hear proposal: if there are two guests with odd number of vCPUs they
> can no longer run on sibling Hyper Threads because my patches create
> separate group for each QEMU. This is a performance penalty. Ideally, we
> would have a knob inside domain XML that would place two or more domains
> into the same trusted group. But since there's pre-existing example (of
> sharing a piece of information between two domains) I've failed to come
> up with something usable.

Right now users have two choices

  - Run with SMT enabled. 100% of CPUs available. VMs are vulnerable
  - Run with SMT disabled. 50% of CPUs available. VMs are safe

What the core scheduling gives is somewhere inbetween, depending on
the vCPU count. If we assume all guests have even CPUs then

  - Run with SMT enabled + core scheduling. 100% of CPUs available.
    100% of CPUs are used, VMs are safe

This is the ideal scenario, and probably the fairly common scenario
too as IMHO even number CPU counts are likely to be typical.

If we assume the worst case, of entirely 1 vCPU guests then we have

  - Run with SMT enabled + core scheduling. 100% of CPUs available.
    50% of CPUs are used, VMs are safe

This feels highly unlikely though, as all except tiny workloads
want > 1 vCPU.

With entirely 3 vCPU guests then we have

  - Run with SMT enabled + core scheduling. 100% of CPUs available.
    75% of CPUs are used, VMs are safe

With entirely 5 vCPU guests then we have

  - Run with SMT enabled + core scheduling. 100% of CPUs available.
    83% of CPUs are used, VMs are safe

If we have a mix of even and odd numbered vCPU guests, with mostly
even numbered, then I think utilization will  be high enough that
almost no one will care about the last few %.

While we could try to come up with a way to express sharing of
cores between VMs I don't think its worth it, in the absence of
someone presenting compelling data why it'll be needed in a non
niche use case. Bear in mind, that users can also resort to
pinning VMs explicitly to get sharing.

In terms of defaults I'd very much like us to default to enabling
core scheduling, so that we have a secure deployment out of the box.
The only caveat is that this does have the potential to be interpreted
as a regression for existing deployments in some cases. Perhaps we
should make it a meson option for distros to decide whether to ship
with it turned on out of the box or not ?

I don't think we need core scheduling to be a VM XML config option,
because security is really a host level matter IMHO, such that it
does't make sense to have both secure & insecure VMs co-located.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|