[edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression

Laszlo Ersek lersek at redhat.com
Thu Jan 12 18:22:04 UTC 2023


On 1/12/23 18:58, Laszlo Ersek wrote:

> Your proposal is entirely justified, from a practical / user
> perspective, but I'm not the right person for it.

This should do what you propose (untested), but I hate the code myself.

diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
index 3e13c5d4b34f..b4c0b47c8bb5 100644
--- a/OvmfPkg/Library/PlatformInitLib/Platform.c
+++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
@@ -541,6 +541,22 @@ PlatformMaxCpuCountInitialization (
         ASSERT (Selected == Possible || Selected == 0);
       } while (Selected > 0);
 
+      if (Present == 0) {
+        if (FeaturePcdGet (PcdSmmSmramRequire)) {
+          ASSERT (FALSE);
+          CpuDeadLoop();
+        }
+        //
+        // The bug is in QEMU v5.1.0+, where we're not affected by the QEMU v2.7
+        // reset bug, so BootCpuCount from fw_cfg is reliable. Assume a fully
+        // populated topology, precluding hotplug, like when the modern CPU
+        // hotplug interface is unavailable. This will satisfy the QEMU v2.7
+        // reset bug sanity check below as well.
+        //
+        Present  = BootCpuCount;
+        Possible = BootCpuCount;
+      }
+
       //
       // Sanity check: fw_cfg and the modern CPU hotplug interface should
       // return the same boot CPU count.


I *really* don't like this. With this, you might get through the firmware and reach the guest OS, and pre-boot, the MP PPI and protocol might even work. But, because the QEMU register block is broken, I don't have the slightest idea what's going to happen if you attempt a CPU hot-*unplug* at OS runtime (unplug because in effect you start with a fully populated topology). OVMF does not participate in that (because of no SMM), but the ACPI content generated by QEMU does, and that content (AML methods) do massage the register block.

In a sense, with the register block negotiation broken, even without SMM in the picture, the guest OS could consider OVMF doing it a *service* by hanging, so that guest OS-level data is not lost when the admin attempts an unplug, and the guest OS crashes. (I've not reproduced such a crash to be clear, I just think it could be possible.)

There's got to be a limit to how far we try to compensate for broken (virtual) hardware. :( The right thing to do is to wait for the QEMU patch to reach as many as possible stable branches, let the distros pick up the new stable releases, and then merge the hardliner hang.

Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#98386): https://edk2.groups.io/g/devel/message/98386
Mute This Topic: https://groups.io/mt/96218818/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-




More information about the edk2-devel-archive mailing list