[PATCH RFC 10/10] qemu: Place helper processes into the same trusted group

Michal Prívozník mprivozn at redhat.com
Tue May 24 15:35:03 UTC 2022


On 5/24/22 12:33, Daniel P. Berrangé wrote:
> On Tue, May 24, 2022 at 11:50:50AM +0200, Michal Prívozník wrote:
>> On 5/23/22 18:30, Daniel P. Berrangé wrote:
>>> On Mon, May 09, 2022 at 05:02:17PM +0200, Michal Privoznik wrote:
>>>> Since the level of trust that QEMU has is the same level of trust
>>>> that helper processes have there's no harm in placing all of them
>>>> into the same group.
>>>
>>> This assumption feels like it might be a bit of a stretch. I
>>> recall discussing this with Paolo to some extent a long time
>>> back, but let me recap my understanding.
>>>
>>> IIUC, the attack scenario is that a guest vCPU thread is scheduled
>>> on a SMT sibling with another thread that is NOT running guest OS
>>> code. "another thread" in this context refers to many things
>>>
>>>   - Random host OS processes
>>>   - QEMU vCPU threads from a different geust
>>>   - QEMU emulator threads from any guest
>>>   - QEMU helper process threads from any guest
>>>
>>> Consider for example, if the QEMU emulator thread contains a password
>>> used for logging into a remote RBD/Ceph server. That is a secret
>>> credential that the guest OS should not have permission to access.
>>>
>>> Consider alternatively that the QEMU emulator is making a TLS connection
>>> to some service, and there are keys negotiated for the TLS session. While
>>> some of the data transmitted over the session is known to the guest OS,
>>> we shouldn't assume it all is.
>>>
>>> Now in the case of QEMU emulator threads I think you can make a somewhat
>>> decent case that we don't have to worry about it. Most of the keys/passwds
>>> are used once at cold boot, so there's no attack window for vCPUs at that
>>> point. There is a small window of risk when hotplugging. If someone is
>>> really concerned about this though, they shouldn't have let QEMU have
>>> these credentials in the first place, as its already vulnerable to a
>>> guest escape. eg use kernel RBD instead of letting QEMU directly login
>>> to RBD.
>>>
>>> IOW, on balance of probabilities it is reasonable to let QEMU emulator
>>> threads be in the same core scheduling domain as vCPU threads.
>>>
>>> In the case of external QEMU helper processes though, I think it is
>>> a far less clearcut decision.  There are a number of reasons why helper
>>> processes are used, but at least one significant motivating factor is
>>> security isolation between QEMU & the helper - they can only communicate
>>> and share information through certain controlled mechanisms.
>>>
>>> With this in mind I think it is risky to assume that it is  safe to
>>> run QEMU and helper processes in the same core scheduling group. At
>>> the same time there are likely cases where it is also just fine to
>>> do so.
>>>
>>> If we separate helper processes from QEMU vCPUs this is not as wasteful
>>> as it sounds. Some the helper processes are running trusted code, there
>>> is no need for helper processes from different guests to be isolated.
>>> They can all just live in the default core scheduling domain.
>>>
>>> I feel like I'm talking myself into suggesting the core scheduling
>>> host knob in qemu.conf needs to be more than just a single boolean.
>>> Either have two knobs - one to turn it on/off and one to control
>>> whether helpers are split or combined - or have one knob and make
>>> it an enumeration.
>>
>> Seems reasonable. And the default should be QEMU's emulator + vCPU
>> threads in one sched group, and all helper processes in another, right?
> 
> Not quite. I'm suggesting that helper processes can remain in the
> host's default core scheduling group, since the helpers are all
> executing trusted machine code.
> 
>>> One possible complication comes if we consider a guest that is
>>> pinned, but not on the fine grained per-vCPU basis.
>>>
>>> eg if guest is set to allow floating over a sub-set of host CPUs
>>> we need to make sure that it is possible to actually execute the
>>> guest still. ie if entire guest is pinned to 1 host CPU but our
>>> config implies use of 2 distinct core scheduling domains, we have
>>> an unsolvable constraint.
>>
>> Do we? Since we're placing emulator + vCPUs into one group and helper
>> processes into another these would never run at the same time, but that
>> would be the case anyways - if emulator write()-s into a helper's socket
>> it would be blocked because the helper isn't running. This "bottleneck"
>> is result of pinning everything onto a single CPU and exists regardless
>> of scheduling groups.
>>
>> The only case where scheduling groups would make the bottleneck worse is
>> if emulator and vCPUs were in different groups, but we don't intent to
>> allow that.
> 
> Do we actually pin the helper processes at all ?

Yes, we do. Into the same CGroup as emulator thread:
qemuSetupCgroupForExtDevices().

> 
> I was thinking of a scenario where we implicitly pin helper processes to
> the same CPUs as the emulator threads and/or QEMU process-global pinning
> mask. eg
> 
> If we only had
> 
>   <vcpu placement='static' cpuset="2-3" current="1">2</vcpu>
> 
> Traditionally the emulator threads, i/o threads, vCPU  threads will
> all float across host CPUs 2 & 3. I was assuming we also placed
> helper processes in these same 2 host CPUs. Not sure if that's right
> or not. Assuming we do, then...
> 
> Lets say CPUs 2 & 3 are SMT siblings.
> 
> We have helper processes in the default core scheduling
> domain and QEMU in a dedicated core scheduling domain. We
> loose 100% of concurrency between the vCPUs and helper
> processes.

So in this case users might want to have helpers and emulator in the
same group. Therefore, in qemu.conf we should allow something like:

  sched_core = "none" // off, no SCHED_CORE
               "emulator" // default, place only emulator & vCPU threads
                          // into the group
               "helpers" // place emulator & vCPU & helpers into the
                         // group

I agree that "helpers" is terrible name, maybe "emulator+helpers"? Or
something completely different? Maybe:

  sched_core = [] // off
               ["emulator"] // enumlator & vCPU threads
               ["emulator","helpers"] // emulator + helpers

We can refine "helpers" in future (if needed) to say "virtiofsd",
"dbus", "swtpm" allowing users to fine tune what helper processes are
part of the group.

Michal



More information about the libvir-list mailing list