[edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression

Mon Jan 16 14:48:06 UTC 2023

On 1/13/23 13:22, Gerd Hoffmann wrote:
> On Fri, Jan 13, 2023 at 11:10:54AM +0100, Laszlo Ersek wrote:
>> On 1/13/23 10:32, Gerd Hoffmann wrote:
>>> On Fri, Jan 13, 2023 at 07:03:54AM +0100, Gerd Hoffmann wrote:
>>>>   Hi,
>>>>
>>>>> - QEMU can be configured with other compat properties on the
>>>>> command line so that "CPU hotplug with SMI" and "CPU hot-unplug
>>>>> with SMI" *not* be offered to the firmware. Then QEMU will reject
>>>>> hotplug attempts, and the SMM hotplug code in edk2 will not be
>>>>> triggered by the (virtual) hardware.
>>>>
>>>> Can we have edk2 print instructions for that in the error message?
>>>
>>> This seems to be:
>>>
>>>     qemu -M q35 \
>>>         -global ICH9-LPC.x-smi-cpu-hotplug=off \
>>>         -global ICH9-LPC.x-smi-cpu-hotunplug=off
>>
>> Yes, those are the flags.
>>
>>> But it appears to not work.
>>
>> They should work, but they take effect in QEMU, and not in the
>> firmware. These knobs control what CPU hot(un)plug+SMI features QEMU
>> exposes to the guest fw, via fw_cfg,
>
> Ok, I see, only the SMM code actually checks that.
>
>> In particular the firmware makes no further decisions based on
>> whether QEMU advertized some of these features.
>
> I was thinking the other way around:  When cpu hotplug is disabled in
> qemu it should be safe to skip the whole cpu hotplug checking dance.
> See test patch below.
>
> That would give us a config switch (turn off cpu hotplug support)
> which would allow edk2 run on qemu versions with broken cpu hotplug.
>
> Does the idea look sane or do I miss something?
>
> take care,
>   Gerd
>
> commit bd2e36eba35268ab46c0125d2b9125391ea6f9fc
> Author: Gerd Hoffmann <kraxel at redhat.com>
> Date:   Fri Jan 13 13:07:36 2023 +0100
>
>     skip cpu present checking when hotplug is off
>
> diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
> index 13348afb4890..2b0f0c836f85 100644
> --- a/OvmfPkg/Library/PlatformInitLib/Platform.c
> +++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
> @@ -415,8 +415,9 @@ PlatformMaxCpuCountInitialization (
>    IN OUT EFI_HOB_PLATFORM_INFO  *PlatformInfoHob
>    )
>  {
> -  UINT16  BootCpuCount = 0;
> -  UINT32  MaxCpuCount;
> +  UINT16   BootCpuCount = 0;
> +  UINT32   MaxCpuCount;
> +  BOOLEAN  CpuHotplugSupported = FALSE;
>
>    //
>    // Try to fetch the boot CPU count.
> @@ -424,6 +425,31 @@ PlatformMaxCpuCountInitialization (
>    if (QemuFwCfgIsAvailable ()) {
>      QemuFwCfgSelectItem (QemuFwCfgItemSmpCpuCount);
>      BootCpuCount = QemuFwCfgRead16 ();
> +    DEBUG ((DEBUG_INFO, "%a: BootCpuCount: %d\n", __FUNCTION__, BootCpuCount));
> +  }
> +
> +  {
> +    FIRMWARE_CONFIG_ITEM  SupportedFeaturesItem;
> +    UINTN                 SupportedFeaturesSize;
> +    UINT64                mSmiFeatures;
> +    EFI_STATUS            Status;
> +
> +    Status = QemuFwCfgFindFile (
> +               "etc/smi/supported-features",
> +               &SupportedFeaturesItem,
> +               &SupportedFeaturesSize
> +               );
> +
> +    if (EFI_ERROR (Status)) {
> +      DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: %r\n", __FUNCTION__, Status));
> +    } else {
> +      QemuFwCfgSelectItem (SupportedFeaturesItem);
> +      QemuFwCfgReadBytes (sizeof mSmiFeatures, &mSmiFeatures);
> +      DEBUG ((DEBUG_INFO, "%a: etc/smi/supported-features: 0x%x\n", __FUNCTION__, mSmiFeatures));
> +      if (mSmiFeatures & (BIT1 /* hotplug */ | BIT2 /* hotunplug */)) {
> +        CpuHotplugSupported = TRUE;
> +      }
> +    }
>    }
>
>    if (BootCpuCount == 0) {
> @@ -435,6 +461,9 @@ PlatformMaxCpuCountInitialization (
>      //
>      DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n", __FUNCTION__));
>      MaxCpuCount = PlatformInfoHob->DefaultMaxCpuNumber;
> +  } else if (!CpuHotplugSupported) {
> +    DEBUG ((DEBUG_INFO, "%a: CPU hotplug support not available\n", __FUNCTION__));
> +    MaxCpuCount = BootCpuCount;
>    } else {
>      //
>      // We will expose BootCpuCount to MpInitLib. MpInitLib will count APs up to
>

This would be wrong.

Let me provide a summary of CPU hotplug with SMM. It consists of the
following patch sets, in edk2. Note that these sets were implemented and
merged in dependency order, and I'm listing them likewise.

(E1) a7e2d20193e8..778832bcad33, for TianoCore#1515

(E2) c8b8157e126a..83357313dd67, for TianoCore#1515

(E3) 422da35375c6..75839f977d37, for TianoCore#1512

(E4) 61d3b2d4279e..1158fc8e2c7b, for TianoCore#1512

(E5) 5ba203b54e59, no BZ

(E6) 63d92674d240..cbccf995920a, for TianoCore#2929 [not really relevant
here, just for completeness]

(I'm ignoring hot-unplug now, which came later. See TianoCore#3132; the
commit range is noted there too.)

On the QEMU side, some relevant series are (again in dependency order):

(Q1) be9612e8cbb4..3a61c8db9d25

(Q2) fd9b0830b0be [just for completeness]

(Q3) 2d69eba5fe52..6e2e2e8a4220

I recommend reading through all those commits.

*** Series E1+E2 allow OVMF to expose the precise boot CPU count
(=present) and the max CPU count (=possible) to UefiCpuPkg/MpInitLib,
and the rest of the firmware. Such that the initial enumeration is both
precise and quick, and that all components in the firmware can
distinguish these two concepts.

Note in particular the final patch in series E2. That patch makes CPU
hotplug work with S3 resume *without* SMM at all in the picture, for
exmaple on i440fx too. Masking the "present at boot" vs "possible" CPU
count difference based on "etc/smi/supported-features" would break that
functionality (again, totally unrelated to SMM). There is another (more
serious) arguments against keying this workaround off of
"etc/smi/supported-features", and I'm going to get to that as well, but
this is one reason already.

The QEMU regression breaks the present vs. possible logic in series E2.
Assume we're on i440fx, or else on Q35 with an OVMF binary that does not
include the SMM driver stack. In this case, the firmware need not take
notice of CPU hotplugging in real time, but during S3 resume, it must be
able to see an increased CPU count. Even though the firmware fails to
use the modern regblock, the firmware's enablement of the modern
regblock *may not* be necessary for the guest OS to still hotplug CPUs
(using QEMU's ACPI methods). There is no firmware<->OS dependency in
this case. So such an S3 failure could lead to data loss.

I'm against trying to come up with an extremely sophisticated condition
under which this present vs. possible masking would be safe.

(The QEMU-side counterpart is a *sub-series* of Q1.)

*** Series E3 is the "SMRAM at default SMBASE" implementation in OVMF.
The QEMU-side counterpart is the *other sub-series* of Q1.

This feature is the core of the security of CPU hotplug, when SMM is
enabled.

The threat model is the following: the malicious guest OS places attack
code at the default SMBASE, in RAM. Later, the VM admin on the host side
hotplugs a CPU, eventually. The malicious guest OS immediately sends the
new CPU an SMI followed by an INIT-SIPI-SIPI. This means that the new
CPU starts executing the OS's attack code at the default SMBASE, in SMM,
and with access to all SMRAM. So it can attack the firmware's SMM
machinery.

This feature prevents the attack by replacing guest OS writeable RAM at
the default SMBASE with SMRAM, and locks it down during boot. The
feature is negotiated by PlatformPei in
Q35SmramAtDefaultSmbaseInitialization(), and is represented by
PcdQ35SmramAtDefaultSmbase; it orchestrates multiple drivers. The
QEMU-side property is called "mch.smbase-smram".

Now here I want to point out something very important. Note that QEMU
series Q1 implements *both* the "SMRAM at default SMBASE" feature *and*
the "Command data 2" register in the CPU hotplug register block. The
"Command data 2" register is used for two purposes: negotiating the
"modern vs. legacy" operation of the register block, and for getting the
APIC ID of a particular CPU. Different locations in OVMF use "Command
data 2" for these different purposes, but what's important here is that
the successful negotiation of the "SMRAM at default SMBASE" feature
*implies* that the "Command data 2" register *works*. In other words,
"SMRAM at default SMBASE", if negotiated, can be used as a proxy for
"Command data 2". Both of these features would be released in QEMU
v5.0.0, coming from the same one Q1 patch series, one would be
negotiable, the other not.

Therefore, locations in OVMF that want *just* the "Command data 2"
register, go through the legacy vs. modern regblock negotiation, using
"Command data 2". Other locations in OVMF, namely those that want the
"SMRAM at default SMBASE" feature only, or want *both* features, would
only check for "SMRAM at default SMBASE".

*** Series E4 implements the actual hotplug (w/ SMM) feature in OVMF.

Note that in commit 17efae27acaf ("OvmfPkg/CpuHotplugSmm: introduce
skeleton for CPU Hotplug SMM driver", 2020-03-04),
PcdQ35SmramAtDefaultSmbase==FALSE causes the driver to exit gracefully
with EFI_UNSUPPORTED. However, if the feature has been negotiated, all
further steps must complete fine, or the driver hangs intentionally.

In commit f668e7887165 ("OvmfPkg/CpuHotplugSmm: define the
QEMU_CPUHP_CMD_GET_ARCH_ID macro", 2020-03-04), the "Command data 2"
register is sanity checked -- *enforced*. That's based exactly on the
above-noted "proxying"; see the big code comment in the commit:

+  //
+  // Sanity-check the CPU hotplug interface.
+  //
+  // Both of the following features are part of QEMU 5.0, introduced primarily
+  // in commit range 3e08b2b9cb64..3a61c8db9d25:
+  //
+  // (a) the QEMU_CPUHP_CMD_GET_ARCH_ID command of the modern CPU hotplug
+  //     interface,
+  //
+  // (b) the "SMRAM at default SMBASE" feature.
+  //
+  // From these, (b) is restricted to 5.0+ machine type versions, while (a)
+  // does not depend on machine type version. Because we ensured the stricter
+  // condition (b) through PcdQ35SmramAtDefaultSmbase above, the (a)
+  // QEMU_CPUHP_CMD_GET_ARCH_ID command must now be available too. While we
+  // can't verify the presence of precisely that command, we can still verify
+  // (sanity-check) that the modern interface is active, at least.
+  //
+  // Consult the "Typical usecases | Detecting and enabling modern CPU hotplug
+  // interface" section in QEMU's "docs/specs/acpi_cpu_hotplug.txt", on the
+  // following.
+  //

The same implication / "proxy nature" is used in commit 1158fc8e2c7b
("OvmfPkg/CpuS3DataDxe: enable S3 resume after CPU hotplug",
2020-03-04), as well.

So this shows why the QEMU regression is so nasty; it doesn't only break
the explicit negotiations involving "Command data 2", but breaks *other*
code locations that deduce the availability of "Command data 2" from a
stricter, and separately negotiated, feature.

*** Patch E5 is what introduces the "etc/smi/supported-features" stuff
in OVMF. The QEMU counterpart is the Q3 series.

Note that for OVMF, this is effectively an afterthought. It is not
needed for securing the firmware from a malicious OS, it is also not
needed for functionality, as far as the firmware is concerned. It
basically communicates OVMF's features to QEMU.

This stuff in OVMF enables two things:

- it enables QEMU to protect a *non-malicious* OS from crashing if the
firmware underneath supports SMI broadcast but is unaware of CPU
hotplug,

- it enables QEMU to generate ACPI content for a *non-malicious* OS so
that the OS inform the firmware (via raising an SMI) about a CPU hotplug
event.

Note that there is zero security impact from this; that was covered by
series E3 (= the "SMRAM at default SMBASE" feature).

This is the other (more serious) reason that we shouldn't use BIT1
(ICH9_LPC_SMI_F_CPU_HOTPLUG) and BIT2 (ICH9_LPC_SMI_F_CPU_HOT_UNPLUG) in
"etc/smi/supported-features" for driving anything in the firmware. They
were never meant to *steer* the firmware. BIT0
(ICH9_LPC_SMI_F_BROADCAST) was meant to steer both the firmware and
QEMU, but BIT1 and BIT2 only steer QEMU.

In summary, to keep OVMF plus the guest OS "healthy" in face of the QEMU
regression, at least three things are necessary:

(a) Do *not* extend the role of SMI feature negotiation to steer the
workaround; keep it intact. With time, the QEMU regression will lose its
significance, but we're going to be left with a horrible confusion
(design violation) of BIT1 and BIT2 in the SMI features bitmask. (Case
in point: the QEMU-v2.7 reset bug is totally irrelevant nowadays, but we
still have the workaround in place for it, and that workaround *worsens*
the symptoms of the present QEMU regression!)

(b) Avoid potential guest OS data loss by breaking *basic* S3 resume
functionality (with SMM entirely out of the picture). This means that
*whenever* the guest OS is capable of hotplugging a CPU (including that
QEMU allows such an operation to be initiated over QMP), the S3 resume
code in OVMF must be able to deal with an increased number of CPUs,
relative to the first (cold) boot. In other words, the difference
between "present" and "possible" must not be masked, whenever the guest
OS is capable of hotplugging a CPU (even without SMM in the picture),
using the ACPI code generated by QEMU.

(In the beginning, I had proposed for QEMU to expose a new fw_cfg file,
for "max cpus", but Igor (justifiedly) suggested to just collect the
"possible" count from the modern regblock:
<http://mid.mail-archive.com/20191008175931.483af366@redhat.com>.)

(c) *No location* in the code that deduces "Command data 2 register
works" from "PcdQ35SmramAtDefaultSmbase == TRUE" must be reached,
because this implication is broken by the QEMU regression.

Note that my proposal (hard hang upon "Command data 2 register not
working") satisfies all three, and with time, becomes applicable. If
pressure needs to be exterted somewhere to expedite that, then it's the
QEMU stable release process.

Note that clearing "mch.smbase-smram" on the QEMU command line, in
itself, would be very wrong. It would disable the security from series
E3, like it used to be on 4.2 machine types and before; however, at the
same time, it would not prevent OVMF from acknowledging the
ICH9_LPC_SMI_F_CPU_HOTPLUG feature bit via fw_cfg, which appeared with
5.2 machine types. In other words, it would create a machine type that
never existed, and OVMF never expected to see -- the firmware wouldn't
protect itself from a malicious guest OS, and QEMU would permit the
admin to hotplug CPUs.

If you simply go back to 4.2 machine types ("-M" switch), disabling both
the "SMRAM at default SMBASE" and ICH9_LPC_SMI_F_CPU_HOTPLUG features,
the hotplug regblock's negotiation still malfunctions (it is not
versioned in QEMU). Assuming you hang OVMF if *either one* of
PcdQ35SmramAtDefaultSmbase and ICH9_LPC_SMI_F_CPU_HOTPLUG is set, and
allow the boot to progress if both are clear, the present<->possible
counts would still be wrong, and S3 resume might cause OS data loss if
the OS still managed to hotplug a CPU.

I do not claim that a logical predicate "does not exist" that covers
precisely those situations that need to be caught *for the purpose* of
papering them over. (I guess the predicate and/or the action would
somehow involve "PcdQ35SmramAtDefaultSmbase", *BUT* even then, the order
of calls to MaxCpuCountInitialization() and
Q35SmramAtDefaultSmbaseInitialization() is wrong!!!)

What I'm stating is that I'm *unable to reason* about such a predicate
with any sense of safety, due to the predicate's complexity, and due to
it compensating for the violation of a *foundational* invariant.

On my part, I wouldn't like to continue this discussion. I'm really
sorry.

(I'm only responding on-list because Gerd's question so closely follows
my good-bye from the other thread that I think he may not have seen that
until sending the question. But, with this, I have no more input on the
topic, or anything for the list, really.)

Laszlo

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#98576): https://edk2.groups.io/g/devel/message/98576
Mute This Topic: https://groups.io/mt/96218818/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-