[edk2-devel] [PATCH v2 3/3] OvmfPkg/PlatformPei: rewrite MaxCpuCountInitialization() for CPU hotplug

Laszlo Ersek lersek at redhat.com
Fri Oct 25 08:29:20 UTC 2019


On 10/24/19 17:33, Philippe Mathieu-Daudé wrote:
> On 10/23/19 12:15 AM, Laszlo Ersek wrote:
>> MaxCpuCountInitialization() currently handles the following options:
>>
>> (1) QEMU does not report the boot CPU count (FW_CFG_NB_CPUS is 0)
>>
>>      In this case, PlatformPei makes MpInitLib enumerate APs up to the
>>      default PcdCpuMaxLogicalProcessorNumber value (64) minus 1, or until
>>      the default PcdCpuApInitTimeOutInMicroSeconds (50,000) elapses.
>>      (Whichever is reached first.)
>>
>>      Time-limited AP enumeration had never been reliable on QEMU/KVM,
>> which
>>      is why commit 45a70db3c3a5 strated handling case (2) below, in OVMF.
>>
>> (2) QEMU reports the boot CPU count (FW_CFG_NB_CPUS is nonzero)
>>
>>      In this case, PlatformPei sets
>>
>>      - PcdCpuMaxLogicalProcessorNumber to the reported boot CPU count
>>        (FW_CFG_NB_CPUS, which exports "PCMachineState.boot_cpus"),
>>
>>      - and PcdCpuApInitTimeOutInMicroSeconds to practically "infinity"
>>        (MAX_UINT32, ~71 minutes).
>>
>>      That causes MpInitLib to enumerate exactly the present (boot) APs.
>>
>>      With CPU hotplug in mind, this method is not good enough. Because,
>>      using QEMU terminology, UefiCpuPkg expects
>>      PcdCpuMaxLogicalProcessorNumber to provide the "possible CPUs" count
>>      ("MachineState.smp.max_cpus"), which includes present and not
>> present
>>      CPUs both (with not present CPUs being subject for hot-plugging).
>>      FW_CFG_NB_CPUS does not include not present CPUs.
>>
>> Rewrite MaxCpuCountInitialization() for handling the following cases:
>>
>> (1) The behavior of case (1) does not change. (No UefiCpuPkg PCDs are set
>>      to values different from the defaults.)
>>
>> (2) QEMU reports the boot CPU count ("PCMachineState.boot_cpus", via
>>      FW_CFG_NB_CPUS), but not the possible CPUs count
>>      ("MachineState.smp.max_cpus").
>>
>>      In this case, the behavior remains unchanged.
>>
>>      The way MpInitLib is instructed to do the same differs however:
>> we now
>>      set the new PcdCpuBootLogicalProcessorNumber to the boot CPU count
>>      (while continuing to set PcdCpuMaxLogicalProcessorNumber
>> identically).
>>      PcdCpuApInitTimeOutInMicroSeconds becomes irrelevant.
>>
>> (3) QEMU reports both the boot CPU count ("PCMachineState.boot_cpus", via
>>      FW_CFG_NB_CPUS), and the possible CPUs count
>>      ("MachineState.smp.max_cpus").
>>
>>      We tell UefiCpuPkg about the possible CPUs count through
>>      PcdCpuMaxLogicalProcessorNumber. We also tell MpInitLib the boot CPU
>>      count for precise and quick AP enumeration, via
>>      PcdCpuBootLogicalProcessorNumber.
>> PcdCpuApInitTimeOutInMicroSeconds is
>>      irrelevant again.
>>
>> This patch is a pre-requisite for enabling CPU hotplug with SMM_REQUIRE.
>> As a side effect, the patch also enables S3 to work with CPU hotplug at
>> once, *without* SMM_REQUIRE.
>>
>> (Without the patch, S3 resume fails, if a CPU is hot-plugged at OS
>> runtime, prior to suspend: the FW_CFG_NB_CPUS increase seen during resume
>> causes PcdCpuMaxLogicalProcessorNumber to increase as well, which is not
>> permitted.
>>
>> With the patch, PcdCpuMaxLogicalProcessorNumber stays the same, namely
>> "MachineState.smp.max_cpus". Therefore, the CPU structures allocated
>> during normal boot can accommodate the CPUs at S3 resume that have been
>> hotplugged prior to S3 suspend.)
>>
>> Cc: Anthony Perard <anthony.perard at citrix.com>
>> Cc: Ard Biesheuvel <ard.biesheuvel at linaro.org>
>> Cc: Igor Mammedov <imammedo at redhat.com>
>> Cc: Jordan Justen <jordan.l.justen at intel.com>
>> Cc: Julien Grall <julien.grall at arm.com>
>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
>> Signed-off-by: Laszlo Ersek <lersek at redhat.com>
>> ---
>>
>> Notes:
>>      v2:
>>           - use "possible CPUs" term in the code and the commit
>> message [Igor]
>>           - add details about S3 to the commit message [Igor]
>>           - use QEMU's existent CPU hotplug register block, rather
>> than a new
>>        named file in fw_cfg [Igor]
>>           - tested on QEMU v2.6.2 (legacy-only CPU hotplug register
>> block), v2.7.1
>>        (modern register block, but buggy fw_cfg), v2.8.1.1 (no QEMU
>> issues),
>>        v4.0.0 (no QEMU issues)
>>
>>   OvmfPkg/OvmfPkgIa32.dsc             |   2 +-
>>   OvmfPkg/OvmfPkgIa32X64.dsc          |   2 +-
>>   OvmfPkg/OvmfPkgX64.dsc              |   2 +-
>>   OvmfPkg/PlatformPei/PlatformPei.inf |   2 +-
>>   OvmfPkg/PlatformPei/Platform.c      | 172 +++++++++++++++++---
>>   5 files changed, 151 insertions(+), 29 deletions(-)
>>
>> diff --git a/OvmfPkg/OvmfPkgIa32.dsc b/OvmfPkg/OvmfPkgIa32.dsc
>> index 4301e7821902..68d8a9fb9655 100644
>> --- a/OvmfPkg/OvmfPkgIa32.dsc
>> +++ b/OvmfPkg/OvmfPkgIa32.dsc
>> @@ -553,11 +553,11 @@ [PcdsDynamicDefault]
>>     gEfiMdeModulePkgTokenSpaceGuid.PcdSetNxForStack|FALSE
>>     gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable|FALSE
>>       # UefiCpuPkg PCDs related to initial AP bringup and general AP
>> management.
>>     gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber|64
>> -  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000
>> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0
>>       # Set memory encryption mask
>>    
>> gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask|0x0
>>     !if $(SMM_REQUIRE) == TRUE
>> diff --git a/OvmfPkg/OvmfPkgIa32X64.dsc b/OvmfPkg/OvmfPkgIa32X64.dsc
>> index 803fd74ae8e4..e5a6260b6088 100644
>> --- a/OvmfPkg/OvmfPkgIa32X64.dsc
>> +++ b/OvmfPkg/OvmfPkgIa32X64.dsc
>> @@ -565,11 +565,11 @@ [PcdsDynamicDefault]
>>     gEfiMdeModulePkgTokenSpaceGuid.PcdSetNxForStack|FALSE
>>     gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable|FALSE
>>       # UefiCpuPkg PCDs related to initial AP bringup and general AP
>> management.
>>     gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber|64
>> -  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000
>> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0
>>       # Set memory encryption mask
>>    
>> gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask|0x0
>>     !if $(SMM_REQUIRE) == TRUE
>> diff --git a/OvmfPkg/OvmfPkgX64.dsc b/OvmfPkg/OvmfPkgX64.dsc
>> index 5dbd1b793a90..f5d904945103 100644
>> --- a/OvmfPkg/OvmfPkgX64.dsc
>> +++ b/OvmfPkg/OvmfPkgX64.dsc
>> @@ -564,11 +564,11 @@ [PcdsDynamicDefault]
>>     gEfiMdeModulePkgTokenSpaceGuid.PcdSetNxForStack|FALSE
>>     gEfiMdeModulePkgTokenSpaceGuid.PcdPropertiesTableEnable|FALSE
>>       # UefiCpuPkg PCDs related to initial AP bringup and general AP
>> management.
>>     gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber|64
>> -  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000
>> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0
>>       # Set memory encryption mask
>>    
>> gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask|0x0
>>     !if $(SMM_REQUIRE) == TRUE
>> diff --git a/OvmfPkg/PlatformPei/PlatformPei.inf
>> b/OvmfPkg/PlatformPei/PlatformPei.inf
>> index d9fd9c8f05b3..30eaebdfae63 100644
>> --- a/OvmfPkg/PlatformPei/PlatformPei.inf
>> +++ b/OvmfPkg/PlatformPei/PlatformPei.inf
>> @@ -96,11 +96,11 @@ [Pcd]
>>     gEfiMdeModulePkgTokenSpaceGuid.PcdAcpiS3Enable
>>     gEfiMdeModulePkgTokenSpaceGuid.PcdPteMemoryEncryptionAddressOrMask
>>     gEfiSecurityPkgTokenSpaceGuid.PcdOptionRomImageVerificationPolicy
>>     gUefiCpuPkgTokenSpaceGuid.PcdCpuLocalApicBaseAddress
>>     gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber
>> -  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds
>> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber
>>     gUefiCpuPkgTokenSpaceGuid.PcdCpuApStackSize
>>     [FixedPcd]
>>     gEfiMdePkgTokenSpaceGuid.PcdPciExpressBaseAddress
>>   diff --git a/OvmfPkg/PlatformPei/Platform.c
>> b/OvmfPkg/PlatformPei/Platform.c
>> index 3ba2459872d9..e5e8581752b5 100644
>> --- a/OvmfPkg/PlatformPei/Platform.c
>> +++ b/OvmfPkg/PlatformPei/Platform.c
>> @@ -28,11 +28,14 @@
>>   #include <Library/QemuFwCfgLib.h>
>>   #include <Library/QemuFwCfgS3Lib.h>
>>   #include <Library/ResourcePublicationLib.h>
>>   #include <Guid/MemoryTypeInformation.h>
>>   #include <Ppi/MasterBootMode.h>
>> +#include <IndustryStandard/I440FxPiix4.h>
>>   #include <IndustryStandard/Pci22.h>
>> +#include <IndustryStandard/Q35MchIch9.h>
>> +#include <IndustryStandard/QemuCpuHotplug.h>
>>   #include <OvmfPlatforms.h>
>>     #include "Platform.h"
>>   #include "Cmos.h"
>>   @@ -562,47 +565,165 @@ S3Verification (
>>   #endif
>>   }
>>       /**
>> -  Fetch the number of boot CPUs from QEMU and expose it to UefiCpuPkg
>> modules.
>> -  Set the mMaxCpuCount variable.
>> +  Fetch the boot CPU count and the possible CPU count from QEMU, and
>> expose
>> +  them to UefiCpuPkg modules. Set the mMaxCpuCount variable.
>>   **/
>>   VOID
>>   MaxCpuCountInitialization (
>>     VOID
>>     )
>>   {
>> -  UINT16        ProcessorCount;
>> +  UINT16        BootCpuCount;
>>     RETURN_STATUS PcdStatus;
>>   +  //
>> +  // Try to fetch the boot CPU count.
>> +  //
>>     QemuFwCfgSelectItem (QemuFwCfgItemSmpCpuCount);
>> -  ProcessorCount = QemuFwCfgRead16 ();
>> -  //
>> -  // If the fw_cfg key or fw_cfg entirely is unavailable, load
>> mMaxCpuCount
>> -  // from the PCD default. No change to PCDs.
>> -  //
>> -  if (ProcessorCount == 0) {
>> +  BootCpuCount = QemuFwCfgRead16 ();
>> +  if (BootCpuCount == 0) {
>> +    //
>> +    // QEMU doesn't report the boot CPU count. (BootCpuCount == 0)
>> will let
>> +    // MpInitLib count APs up to (PcdCpuMaxLogicalProcessorNumber -
>> 1), or
>> +    // until PcdCpuApInitTimeOutInMicroSeconds elapses (whichever is
>> reached
>> +    // first).
>> +    //
>> +    DEBUG ((DEBUG_WARN, "%a: boot CPU count unavailable\n",
>> __FUNCTION__));
>>       mMaxCpuCount = PcdGet32 (PcdCpuMaxLogicalProcessorNumber);
>> -    return;
>> +  } else {
>> +    //
>> +    // We will expose BootCpuCount to MpInitLib. MpInitLib will count
>> APs up to
>> +    // (BootCpuCount - 1) precisely, regardless of timeout.
>> +    //
>> +    // Now try to fetch the possible CPU count.
>> +    //
>> +    UINTN CpuHpBase;
>> +    UINT32 CmdData2;
>> +
>> +    CpuHpBase = ((mHostBridgeDevId == INTEL_Q35_MCH_DEVICE_ID) ?
>> +                 ICH9_CPU_HOTPLUG_BASE : PIIX4_CPU_HOTPLUG_BASE);
>> +
>> +    //
>> +    // If only legacy mode is available in the CPU hotplug register
>> block, or
>> +    // the register block is completely missing, then the writes
>> below are
>> +    // no-ops.
>> +    //
>> +    // 1. Switch the hotplug register block to modern mode.
>> +    //
>> +    IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, 0);
>> +    //
>> +    // 2. Select a valid CPU for deterministic reading of
>> +    //    QEMU_CPUHP_R_CMD_DATA2.
>> +    //
>> +    //    CPU#0 is always valid; it is the always present and
>> non-removable
>> +    //    BSP.
>> +    //
>> +    IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, 0);
>> +    //
>> +    // 3. Send a command after which QEMU_CPUHP_R_CMD_DATA2 is
>> specified to
>> +    //    read as zero, and which does not invalidate the selector. (The
>> +    //    selector may change, but it must not become invalid.)
>> +    //
>> +    //    Send QEMU_CPUHP_CMD_GET_PENDING, as it will prove useful
>> later.
>> +    //
>> +    IoWrite8 (CpuHpBase + QEMU_CPUHP_W_CMD, QEMU_CPUHP_CMD_GET_PENDING);
>> +    //
>> +    // 4. Read QEMU_CPUHP_R_CMD_DATA2.
>> +    //
>> +    //    If the register block is entirely missing, then this is an
>> unassigned
>> +    //    IO read, returning all-bits-one.
>> +    //
>> +    //    If only legacy mode is available, then bit#0 stands for
>> CPU#0 in the
>> +    //    "CPU present bitmap". CPU#0 is always present.
>> +    //
>> +    //    Otherwise, QEMU_CPUHP_R_CMD_DATA2 is either still reserved
>> (returning
>> +    //    all-bits-zero), or it is specified to read as zero after
>> the above
>> +    //    steps. Both cases confirm modern mode.
>> +    //
>> +    CmdData2 = IoRead32 (CpuHpBase + QEMU_CPUHP_R_CMD_DATA2);
>> +    DEBUG ((DEBUG_VERBOSE, "%a: CmdData2=0x%x\n", __FUNCTION__,
>> CmdData2));
>> +    if (CmdData2 != 0) {
>> +      //
>> +      // QEMU doesn't support the modern CPU hotplug interface.
>> Assume that the
>> +      // possible CPU count equals the boot CPU count (precluding
>> hotplug).
>> +      //
>> +      DEBUG ((DEBUG_WARN, "%a: modern CPU hotplug interface
>> unavailable\n",
>> +        __FUNCTION__));
>> +      mMaxCpuCount = BootCpuCount;
>> +    } else {
>> +      //
>> +      // Grab the possible CPU count from the modern CPU hotplug
>> interface.
>> +      //
>> +      UINT32 Present, Possible, Selected;
>> +
>> +      Present = 0;
>> +      Possible = 0;
>> +
>> +      //
>> +      // We've sent QEMU_CPUHP_CMD_GET_PENDING last; this ensures
>> +      // QEMU_CPUHP_RW_CMD_DATA can now be read usefully. However,
>> +      // QEMU_CPUHP_CMD_GET_PENDING may have selected a CPU with
>> actual pending
>> +      // hotplug events; therefore, select CPU#0 forcibly.
>> +      //
>> +      IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, Possible);
> 
> Nitpicking, I find this easier to read (matches the comment):
> 
>          IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, 0);

I considered that, but I chose this form consciously, in the end.
Namely, if you search this patch for

  IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, Possible);

you will find that there are two instances of it: one before the loop,
and one inside the loop. I wanted to make those operations *look*
identical as well, not just *be* semantically identical.

The same goal prevented me from changing something else too: namely, in
the loop body, the following:

        ++Possible;
        IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, Possible);

could be condensed as:

        IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, ++Possible);

But, that would again break the syntactical identity. And, again, for
some reason I found it helpful to keep those two function calls
syntactically identical.


If you think the above effort doesn't actually improve readability, then
I'm happy to respin with *both* changes at the same time (i.e., pass
constant 0 in the first location, and pass "++Possible" in the second
location). I don't think that implementing either change *in isolation*
would be good.

Thanks!
Laszlo

> 
> Rest very well documented, thanks.
> 
> Reviewed-by: Philippe Mathieu-Daude <philmd at redhat.com>
> 
>> +
>> +      do {
>> +        UINT8 CpuStatus;
>> +
>> +        //
>> +        // Read the status of the currently selected CPU. This will
>> help with a
>> +        // sanity check against "BootCpuCount".
>> +        //
>> +        CpuStatus = IoRead8 (CpuHpBase + QEMU_CPUHP_R_CPU_STAT);
>> +        if ((CpuStatus & QEMU_CPUHP_STAT_ENABLED) != 0) {
>> +          ++Present;
>> +        }
>> +        //
>> +        // Attempt to select the next CPU.
>> +        //
>> +        ++Possible;
>> +        IoWrite32 (CpuHpBase + QEMU_CPUHP_W_CPU_SEL, Possible);
>> +        //
>> +        // If the selection is successful, then the following read
>> will return
>> +        // the selector (which we know is positive at this point).
>> Otherwise,
>> +        // the read will return 0.
>> +        //
>> +        Selected = IoRead32 (CpuHpBase + QEMU_CPUHP_RW_CMD_DATA);
>> +        ASSERT (Selected == Possible || Selected == 0);
>> +      } while (Selected > 0);
>> +
>> +      //
>> +      // Sanity check: fw_cfg and the modern CPU hotplug interface
>> should
>> +      // return the same boot CPU count.
>> +      //
>> +      if (BootCpuCount != Present) {
>> +        DEBUG ((DEBUG_WARN, "%a: QEMU v2.7 reset bug: BootCpuCount=%d "
>> +          "Present=%u\n", __FUNCTION__, BootCpuCount, Present));
>> +        //
>> +        // The handling of QemuFwCfgItemSmpCpuCount, across CPU
>> hotplug plus
>> +        // platform reset (including S3), was corrected in QEMU commit
>> +        // e3cadac073a9 ("pc: fix FW_CFG_NB_CPUS to account for
>> -device added
>> +        // CPUs", 2016-11-16), part of release v2.8.0.
>> +        //
>> +        BootCpuCount = (UINT16)Present;
>> +      }
>> +
>> +      mMaxCpuCount = Possible;
>> +    }
>>     }
>> -  //
>> -  // Otherwise, set mMaxCpuCount to the value reported by QEMU.
>> -  //
>> -  mMaxCpuCount = ProcessorCount;
>> -  //
>> -  // Additionally, tell UefiCpuPkg modules (a) the exact number of
>> VCPUs, (b)
>> -  // to wait, in the initial AP bringup, exactly as long as it takes
>> for all of
>> -  // the APs to report in. For this, we set the longest representable
>> timeout
>> -  // (approx. 71 minutes).
>> -  //
>> -  PcdStatus = PcdSet32S (PcdCpuMaxLogicalProcessorNumber,
>> ProcessorCount);
>> +
>> +  DEBUG ((DEBUG_INFO, "%a: BootCpuCount=%d mMaxCpuCount=%u\n",
>> __FUNCTION__,
>> +    BootCpuCount, mMaxCpuCount));
>> +  ASSERT (BootCpuCount <= mMaxCpuCount);
>> +
>> +  PcdStatus = PcdSet32S (PcdCpuBootLogicalProcessorNumber,
>> BootCpuCount);
>>     ASSERT_RETURN_ERROR (PcdStatus);
>> -  PcdStatus = PcdSet32S (PcdCpuApInitTimeOutInMicroSeconds, MAX_UINT32);
>> +  PcdStatus = PcdSet32S (PcdCpuMaxLogicalProcessorNumber, mMaxCpuCount);
>>     ASSERT_RETURN_ERROR (PcdStatus);
>> -  DEBUG ((DEBUG_INFO, "%a: QEMU reports %d processor(s)\n",
>> __FUNCTION__,
>> -    ProcessorCount));
>>   }
>>       /**
>>     Perform Platform PEI initialization.
>> @@ -636,17 +757,18 @@ InitializePlatform (
>>     }
>>       S3Verification ();
>>     BootModeInitialization ();
>>     AddressWidthInitialization ();
>> -  MaxCpuCountInitialization ();
>>       //
>>     // Query Host Bridge DID
>>     //
>>     mHostBridgeDevId = PciRead16 (OVMF_HOSTBRIDGE_DID);
>>   +  MaxCpuCountInitialization ();
>> +
>>     if (FeaturePcdGet (PcdSmmSmramRequire)) {
>>       Q35TsegMbytesInitialization ();
>>     }
>>       PublishPeiMemory ();
>>


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#49451): https://edk2.groups.io/g/devel/message/49451
Mute This Topic: https://groups.io/mt/36465055/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-





More information about the edk2-devel-archive mailing list