[edk2-devel] [PATCH 3/3] OvmfPkg/PlatformPei: Mark TPM MMIO range as unencrypted for SEV

Fri Apr 30 17:37:34 UTC 2021

On 4/30/21 10:39 AM, Laszlo Ersek wrote:
> On 04/28/21 21:09, Tom Lendacky wrote:
>> On 4/28/21 11:12 AM, Laszlo Ersek wrote:
>>> On 04/27/21 16:58, Tom Lendacky wrote:
>>>> On 4/26/21 9:21 AM, Tom Lendacky wrote:
>>>>> On 4/26/21 7:07 AM, Laszlo Ersek wrote:
>>>>>> On 04/23/21 22:02, Tom Lendacky wrote:
>>>
>>>>>>> 1. SEV works with the current encrypted mapping, it is only the SEV-ES
>>>>>>>    support that fails because of the ValidateMmioMemory() check. I can do
>>>>>>>    the mapping change just for SEV-ES since it is X64 only. This works,
>>>>>>>    because MemEncryptSevClearPageEncMask() will not return UNSUPPORTED
>>>>>>>    when running in 64-bit.
>>>>>>
>>>>>> Can we really say "SEV works" though? Because, even using an X64 PEI
>>>>>> phase, and enabling only SEV (not SEV-ES), TPM access will be broken in
>>>>>> the PEI phase. Is my understanding correct?
>>>>>
>>>>> Because the memory range is marked as MMIO, we'll take a nested page fault
>>>>> (NPF). The GPA passed as part of the NPF does not include the c-bit. So we
>>>>> do in fact work properly with a TPM in SEV.
>>>
>>> Thanks for the explanation.
>>>
>>> Here's what bothers me about it:
>>>
>>> In AmdSevDxe, we clear the C-bit from MMIO and NonExistent areas in the
>>> GCD memory space map. This occurs early in the DXE phase (see "APRIORI
>>> DXE" in the FDF files). The justification is that we want the flash and
>>> (for example) the PCI MMIO apertures decrypted.
>>>
>>> Now, I realize there is a difference between flash and TPM. TPM is
>>> purely MMIO (no KVM memslot), but flash (when it is not in programming
>>> mode) is backed by a read-only KVM memslot. IOW, flash is "actual
>>> memory", and so it is affected by SEV. TPM is never "actual memory", so
>>> (according to your explanation, AIUI) it always traps to QEMU, per
>>> access, and the C-bit doesn't interfere with that.
>>>
>>> This is consistent with two facts about OVMF's PEI phase:
>>>
>>> - We use IO Port-based fw_cfg (never DMA), if SEV is enabled (see
>>> "QemuFwCfgPei.c").
>>>
>>> - We access PCI config space via IO Ports (0xCF8, 0xCFC), never ECAM.
>>> (This was not motivated by SEV, see commit 7523788faa51, but it does
>>> play nice with SEV, in the PEI phase -- I think?)
>>>
>>> What I'm confused about, now, in retrospect, is the reference to the PCI
>>> MMIO aperture, in AmdSevDxe. If that area isn't backed by a KVM memslot
>>> *either* -- similarly to the TPM area --, then decrypting *that* in
>>> AmdSevDxe (via "nonexistent") is not strictly necessary. Is that correct?
>>
>> It is necessary for (and was added for) SEV-ES. The explicit mapping of
>> the PCI MMCONFIG range as unencrypted was added for SEV-ES so that the
>> SEV-ES mitigation would not terminate the guest because MMIO was being
>> performed against an encrypted address.
>>
>> There's a nice comment in OvmfPkg/PlatformPei/Platform.c that talks about
>> how the MMCONFIG range is not added as MMIO, but instead as reserved
>> memory. Because of that, the loop through the memory space map in
>> AmdSevDxe.c does not mark the MMCONFIG range as unencrypted.
> 
> I didn't mean commit 84cddd70820f ("OvmfPkg/AmdSevDxe: Clear encryption
> bit on PCIe MMCONFIG range", 2021-01-07) -- I didn't mean PCI config space.
> 
> I meant commit 24e4ad75546b ("OvmfPkg: Add AmdSevDxe driver",
> 2017-07-10) -- I wrote "PCI MMIO aperture".
> 
> So, to repeat my question: when 24e4ad75546b3 was implemented (which
> happened just for SEV), then (apparently) clearing the C-bit on the PCI
> MMIO aperture (where PCI MMIO BARs were going to be allocated from)
> wasn't strictly necessary *at the time*; correct? Because that area
> isn't backed by RAM, accesses trap to QEMU directly, and the C bit does
> not make a difference there. (IIUC). Does this make sense or am I wrong?

Ah, sorry, I misunderstood. For the most part, that is true. Not clearing
the c-bit from an area where the PCI MMIO BARs are going to be allocated
would not necessarily cause a problem.

However, clearing the c-bit for non-existent areas was needed for other
reasons. It would result in NVRAM variable corruption because of the way
that area is represented in OVMF and how it is mapped by the hypervisor.

For example, if the clearing of the c-bit for the address at 0xfef00000,
which is marked as EfiGcdMemoryTypeNonExistent, isn't done, then using the
same nvram FD file in an SEV guest results in the following:

  Variable FV header is not valid. It will be reinitialized.

because a different encryption key is used on each launch of the guest.

And attempting to use the same nvram FD file for a non-SEV guest, after
running as an SEV guest, causes an ASSERT:

  ASSERT [VariableRuntimeDxe] /root/kernels/ovmf-build-X64/MdeModulePkg/Universal/Variable/RuntimeDxe/VariableNonVolatile.c(218): VariableStore->Size == VariableStoreLength

because the contents have been encrypted.

The video buffer suffers from a similar problem in that the video buffer
contents end up containing data that appears encrypted to the hypervisor
and results in corruption of the screen contents when attaching, e.g.,
via VNC. Although, that area is marked as EfiGcdMemoryTypeMemoryMappedIo.

Thanks,
Tom

> 
> 
>>
>>>
>>> I'm not asking for any code changes, just trying to form a consistent view.
>>>
>>> Another question (still for "base SEV"): when OVMF is built with
>>> SMM_REQUIRE, PlatformPei performs a (read-only) variable access. See the
>>> ReadOnlyVariable2->GetVariable() call in the RefreshMemTypeInfo()
>>> function. When SEV is active, does control reach RefreshMemTypeInfo() on
>>> your end? And does ReadOnlyVariable2->GetVariable() succeed for you?
>>>
>>> (There is a DEBUG_VERBOSE message in OnReadOnlyVariable2Available(), and
>>> a DEBUG_ERROR in RefreshMemTypeInfo(), so the above questions can be
>>> answered just from the log; no need to modify the code for that.)
>>
>> Yes, control reaches that point. But, notably, with a legacy VM I see the
>> following messages:
>>   RefreshMemTypeInfo: GetVariable(): Not Found
>>
>> But with an SEV VM I see:
>>   Firmware Volume for Variable Store is corrupted
>>   RefreshMemTypeInfo: GetVariable(): Not Found
>>
>> So I get the "Not Found" in both cases. But with SEV, I see the
>> "corrupted" message from InitRealNonVolatileVariableStore() in
>> MdeModulePkg/Universal/Variable/RuntimeDxe/VariableNonVolatile.c. I
>> imagine that since the range is marked encrypted, it reads the header
>> incorrectly and fails.
> 
> Thank you for confirming! I don't think we need to sweat this symptom.
> I'm just glad my hunch wasn't wrong.
> 
> Thanks,
> Laszlo
> 
>>
>> Thanks,
>> Tom
>>
>>>
>>> Basically, with SEV enabled, I expect ReadOnlyVariable2->GetVariable()
>>> to fail -- or even to remain unreached, as FaultTolerantWritePei and
>>> VariablePei could bail out earlier (before installing the Variable PPI),
>>> due to failing flash accesses. In case I'm *not* wrong -- it's not the
>>> end of the world, I'm only asking this question too for the sake of
>>> clarifying "C-bit vs. MMIO".
>>>
>>> More or less it seems to boil down to whether there is a KVM memslot or
>>> not -- which is *not* equivalent to OVMF considering the area MMIO or not.
>>>
>>>
>>>>> SEV-ES would also work
>>>>> properly if the mitigation for accessing an encrypted address was removed
>>>>> from the #VC handler. It is only this added mitigation to protect MMIO
>>>>> that results in an issue with the TPM in PEI.
>>>>
>>>> So I'm thinking that I can have TpmMmioSevDecryptPeim.c do this:
>>>>
>>>>   //
>>>>   // If SEV or SEV-ES is active, MMIO succeeds against an encrypted physical
>>>>   // address because the nested page fault (NPF) that occurs on access does not
>>>>   // include the encryption bit in the guest physical address provided to the
>>>>   // hypervisor.
>>>>   //
>>>>   // However, if SEV-ES is active, before performing the actual MMIO, an
>>>>   // additional MMIO mitigation check is performed in the #VC handler to ensure
>>>>   // that MMIO is being done to an unencrypted address. To prevent guest
>>>>   // termination in this scenario, mark the range unencrypted ahead of access.
>>>>   //
>>>>   if (MemEncryptSevEsIsEnabled ()) {
>>>>     // Do MemEncryptSevClearPageEncMask() ...
>>>>   }
>>>>
>>>> Let me submit the next version with this and see what you think.
>>>
>>> Yep, I'll review that now.
>>>
>>> Thanks
>>> Laszlo
>>>
>>>>
>>>> Thanks,
>>>> Tom
>>>>
>>>>>
>>>>>>
>>>>>> I think the behavior you currently see is actually what we want, we
>>>>>> should double down on it -- if MemEncryptSevClearPageEncMask() fails,
>>>>>> report an explicit DEBUG_ERROR, and call CpuDeadLoop(). If the firmware
>>>>>> is built with TPM_ENABLE, and SEV is active, then an IA32 PEI phase is
>>>>>> simply unusable. Silently pretending that the TPM is not there, even
>>>>>> though it may have been configured on the QEMU command line, we just
>>>>>> failed to communicate with it, is not a good idea, IMO.
>>>>>
>>>>> However, because the c-bit is not part of the NPF, we do communicate
>>>>> successfully with the TPM.
>>>>>
>>>>> So we could actually do following:
>>>>>  - For IA32:
>>>>>    - Remove the Depex on gOvmfTpmMmioAccessiblePpiGuid
>>>>>    - Do not add OvmfPkg/Tcg/TpmMmioSevDecryptPei/TpmMmioSevDecryptPei.inf
>>>>>
>>>>>  - For X64:
>>>>>    - Add the Depex on gOvmfTpmMmioAccessiblePpiGuid
>>>>>    - Add OvmfPkg/Tcg/TpmMmioSevDecryptPei/TpmMmioSevDecryptPei.inf
>>>>>
>>>>> That might be confusing, though. So we could just do option #3 below.
>>>>>
>>>>> Thanks,
>>>>> Tom
>>>>>
>>>>>>
>>>>>> This is somewhat similar IMO to the S3Verification() function in
>>>>>> "OvmfPkg/PlatformPei/Platform.c".
>>>>>>
>>>>>> TPM_ENABLE, SEV, IA32 PEI phase: pick any two.
>>>>>>
>>>>>> Thanks,
>>>>>> Laszlo
>>>>>>
>>>>>>>
>>>>>>> 2. Call MemEncryptSevClearPageEncMask() for SEV or SEV-ES, but don't check
>>>>>>>    the return status.
>>>>>>>
>>>>>>> 3. Create Ia32 and X64 versions of internal functions, where the Ia32
>>>>>>>    version simply returns SUCCESS because it can't do anything and the X64
>>>>>>>    version calls MemEncryptSevClearPageEncMask(), allowing the main code
>>>>>>>    to ASSERT on any errors.
>>>>>>>
>>>>>>> I'm leaning towards #1, because this is an SEV-ES only issue. Thoughts?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tom
>>>>>>>
>>>>>>>>
>>>>>>>> One thing I found is that the Bhyve package makes reference to the
>>>>>>>> OvmfPkg/Bhyve/Tcg directory, but that directory does not exist. So I don't
>>>>>>>> think that TPM enablement has been tested. I didn't update the Bhyve
>>>>>>>> support for that reason.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Tom
>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Laszlo
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
> 

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#74666): https://edk2.groups.io/g/devel/message/74666
Mute This Topic: https://groups.io/mt/82247968/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-