[edk2-devel] [PATCH 1/4] UefiCpuPkg/MpInitLib: honor the platform's boot CPU count in AP detection

Laszlo Ersek lersek at redhat.com
Wed Oct 9 14:55:55 UTC 2019


On 10/09/19 02:57, Dong, Eric wrote:
> Hi Laszlo,
> 
>> -----Original Message-----
>> From: devel at edk2.groups.io [mailto:devel at edk2.groups.io] On Behalf Of
>> Laszlo Ersek
>> Sent: Tuesday, October 8, 2019 7:27 PM
>> To: edk2-devel-groups-io <devel at edk2.groups.io>
>> Cc: Dong, Eric <eric.dong at intel.com>; Igor Mammedov
>> <imammedo at redhat.com>; Ni, Ray <ray.ni at intel.com>
>> Subject: [edk2-devel] [PATCH 1/4] UefiCpuPkg/MpInitLib: honor the platform's
>> boot CPU count in AP detection
>>
>> If a platform boots with a CPU topology that is not fully populated (that
>> is, the boot CPU count is smaller than PcdCpuMaxLogicalProcessorNumber),
>> then the platform cannot use the "fast AP detection" logic added in commit
>> 6e1987f19af7. Said logic depends on the boot CPU count being equal to
>> PcdCpuMaxLogicalProcessorNumber.
>>
>> The platform may not be able to use the variant added in commit
>> 0594ec417c89 either, for different reasons; see for example commit
>> 861218740d6d.
>>
>> Allow platforms to specify the exact boot CPU count, independently of
>> PcdCpuMaxLogicalProcessorNumber.
>>
>> Cc: Eric Dong <eric.dong at intel.com>
>> Cc: Igor Mammedov <imammedo at redhat.com>
>> Cc: Ray Ni <ray.ni at intel.com>
>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=1515
>> Signed-off-by: Laszlo Ersek <lersek at redhat.com>
>> ---
>>  UefiCpuPkg/UefiCpuPkg.dec                     | 11 +++++
>>  UefiCpuPkg/UefiCpuPkg.uni                     |  4 ++
>>  UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf |  1 +
>>  UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf |  3 +-
>>  UefiCpuPkg/Library/MpInitLib/MpLib.c          | 42 ++++++++++++--------
>>  5 files changed, 43 insertions(+), 18 deletions(-)
>>
>> diff --git a/UefiCpuPkg/UefiCpuPkg.dec b/UefiCpuPkg/UefiCpuPkg.dec
>> index 031a2ccd680a..d6b33fd9b465 100644
>> --- a/UefiCpuPkg/UefiCpuPkg.dec
>> +++ b/UefiCpuPkg/UefiCpuPkg.dec
>> @@ -227,6 +227,17 @@ [PcdsFixedAtBuild, PcdsPatchableInModule,
>> PcdsDynamic, PcdsDynamicEx]
>>    ## Specifies timeout value in microseconds for the BSP to detect all APs for
>> the first time.
>>    # @Prompt Timeout for the BSP to detect all APs for the first time.
>>
>> gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds|50000|UI
>> NT32|0x00000004
>> +  ## Specifies the number of Logical Processors that are available in the
>> +  #  preboot environment after platform reset, including BSP and APs. Possible
>> +  #  values:<BR><BR>
>> +  #  zero (default) - This PCD is ignored, and
>> +  #                   PcdCpuApInitTimeOutInMicroSeconds limits the initial AP
>> +  #                   detection by the BSP.<BR>
>> +  #  nonzero        - PcdCpuApInitTimeOutInMicroSeconds is ignored. The initial
>> +  #                   AP detection finishes when the detected CPU count (BSP
>> +  #                   plus APs) reaches the value of this PCD.<BR>
>> +  # @Prompt Number of Logical Processors available after platform reset.
>> +
>> gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber|0|UINT32
>> |0x00000008
>>    ## Specifies the base address of the first microcode Patch in the microcode
>> Region.
>>    # @Prompt Microcode Region base address.
>>
>> gUefiCpuPkgTokenSpaceGuid.PcdCpuMicrocodePatchAddress|0x0|UINT64|0x
>> 00000005
>> diff --git a/UefiCpuPkg/UefiCpuPkg.uni b/UefiCpuPkg/UefiCpuPkg.uni
>> index fbf768072668..a7e279c5cb14 100644
>> --- a/UefiCpuPkg/UefiCpuPkg.uni
>> +++ b/UefiCpuPkg/UefiCpuPkg.uni
>> @@ -37,6 +37,10 @@
>>
>>  #string
>> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuApInitTimeOutInMicroSeconds_HEL
>> P  #language en-US "Specifies timeout value in microseconds for the BSP to
>> detect all APs for the first time."
>>
>> +#string
>> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuBootLogicalProcessorNumber_PRO
>> MPT  #language en-US "Number of Logical Processors available after platform
>> reset."
>> +
>> +#string
>> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuBootLogicalProcessorNumber_HELP
>> #language en-US "Specifies the number of Logical Processors that are available
>> in the preboot environment after platform reset, including BSP and APs."
>> +
>>  #string
>> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuMicrocodePatchAddress_PROMPT
>> #language en-US "Microcode Region base address."
>>
>>  #string
>> STR_gUefiCpuPkgTokenSpaceGuid_PcdCpuMicrocodePatchAddress_HELP
>> #language en-US "Specifies the base address of the first microcode Patch in
>> the microcode Region."
>> diff --git a/UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf
>> b/UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf
>> index 37b3f64e578a..cd912ab0c5ee 100644
>> --- a/UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf
>> +++ b/UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf
>> @@ -61,6 +61,7 @@ [Guids]
>>
>>  [Pcd]
>>    gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber        ##
>> CONSUMES
>> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber       ##
>> CONSUMES
>>    gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds      ##
>> SOMETIMES_CONSUMES
>>    gUefiCpuPkgTokenSpaceGuid.PcdCpuApStackSize                      ## CONSUMES
>>    gUefiCpuPkgTokenSpaceGuid.PcdCpuMicrocodePatchAddress            ##
>> CONSUMES
>> diff --git a/UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf
>> b/UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf
>> index 82b77b63ea87..1538185ef99a 100644
>> --- a/UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf
>> +++ b/UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf
>> @@ -53,7 +53,8 @@ [LibraryClasses]
>>
>>  [Pcd]
>>    gUefiCpuPkgTokenSpaceGuid.PcdCpuMaxLogicalProcessorNumber        ##
>> CONSUMES
>> -  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds      ##
>> CONSUMES
>> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuBootLogicalProcessorNumber       ##
>> CONSUMES
>> +  gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds      ##
>> SOMETIMES_CONSUMES
>>    gUefiCpuPkgTokenSpaceGuid.PcdCpuApStackSize                      ## CONSUMES
>>    gUefiCpuPkgTokenSpaceGuid.PcdCpuMicrocodePatchAddress            ##
>> CONSUMES
>>    gUefiCpuPkgTokenSpaceGuid.PcdCpuMicrocodePatchRegionSize         ##
>> CONSUMES
>> diff --git a/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> index d6f84c6f45c0..f1bf8a7ba7cf 100644
>> --- a/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> +++ b/UefiCpuPkg/Library/MpInitLib/MpLib.c
>> @@ -1044,24 +1044,32 @@ WakeUpAP (
>>        SendInitSipiSipiAllExcludingSelf ((UINT32) ExchangeInfo->BufferStart);
>>      }
>>      if (CpuMpData->InitFlag == ApInitConfig) {
>> -      //
>> -      // Here support two methods to collect AP count through adjust
>> -      // PcdCpuApInitTimeOutInMicroSeconds values.
>> -      //
>> -      // one way is set a value to just let the first AP to start the
>> -      // initialization, then through the later while loop to wait all Aps
>> -      // finsh the initialization.
>> -      // The other way is set a value to let all APs finished the initialzation.
>> -      // In this case, the later while loop is useless.
>> -      //
>> -      TimedWaitForApFinish (
>> -        CpuMpData,
>> -        PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
>> -        PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
>> -        );
>> +      if (PcdGet32 (PcdCpuBootLogicalProcessorNumber) > 0) {
>> +        TimedWaitForApFinish (
>> +          CpuMpData,
>> +          PcdGet32 (PcdCpuBootLogicalProcessorNumber) - 1,
>> +          MAX_UINT32 // approx. 71 minutes
> 
> Do you think if the PcdCpuBootLogicalProcessorNumber is not zero, driver should always wait for all the processors specified by PcdCpuBootLogicalProcessorNumber ready?

Yes, it absolutely must.

Control must never advance past this point if at least one of the CPUs
announced in PcdCpuBootLogicalProcessorNumber fails to check in,
regardless of time limit.

> I just have little concern about the MAX_UINT32 usage. I not sure whether there has a case that some processors can't wake up?

In fact, I would have preferred "absolute infinity" over MAX_UINT32
here; MAX_UINT32 (~71 minutes) is just a substitute that's "good enough"
in practice.

So, to confirm: if a platform sets PcdCpuBootLogicalProcessorNumber to a
nonzero value, then the platform *must not boot* at all, if at least one
processor (from the BSP and APs together) fails to wake, from that number.

Speaking generally, if you encounter a timeout situation -- you give up
waiting for the result of a particular action that you initiated --, you
don't know what the end result is going to be. The action may never
finish, or it may very well finish *after* you stop waiting. This is a
general characteristic of all timeouts.

In this specific case, hanging the boot forever (= failing to boot) is
*much* better than having a stray processor wake up after we stop
waiting, and then run amok in no man's land.

It is my explicit goal to remove the timeout condition from this logic,
and to make it block *forever* if at least one CPU fails to check in.

If this policy is not suitable for a platform, then it should not set
the new PCD to a nonzero value. (The default value is zero.)

For OVMF running on QEMU/KVM, this is the right policy however. If one
of your VCPUs fails to check in, then the virtualization stack (QEMU
and/or KVM) under OVMF is busted, and the guest should *not* continue
booting. Continuing would only lead to more misery down the line.

Thanks,
Laszlo


> 
> Thanks,
> Eric
> 
>> +          );
>> +      } else {
>> +        //
>> +        // Here support two methods to collect AP count through adjust
>> +        // PcdCpuApInitTimeOutInMicroSeconds values.
>> +        //
>> +        // one way is set a value to just let the first AP to start the
>> +        // initialization, then through the later while loop to wait all Aps
>> +        // finsh the initialization.
>> +        // The other way is set a value to let all APs finished the
>> +        // initialzation. In this case, the later while loop is useless.
>> +        //
>> +        TimedWaitForApFinish (
>> +          CpuMpData,
>> +          PcdGet32 (PcdCpuMaxLogicalProcessorNumber) - 1,
>> +          PcdGet32 (PcdCpuApInitTimeOutInMicroSeconds)
>> +          );
>>
>> -      while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) {
>> -        CpuPause();
>> +        while (CpuMpData->MpCpuExchangeInfo->NumApsExecuting != 0) {
>> +          CpuPause();
>> +        }
>>        }
>>      } else {
>>        //
>> --
>> 2.19.1.3.g30247aa5d201
>>
>>
>>
>> -=-=-=-=-=-=
>> Groups.io Links: You receive all messages sent to this group.
>>
>> View/Reply Online (#48563): https://edk2.groups.io/g/devel/message/48563
>> Mute This Topic: https://groups.io/mt/34441228/1768733
>> Group Owner: devel+owner at edk2.groups.io
>> Unsubscribe: https://edk2.groups.io/g/devel/unsub  [eric.dong at intel.com]
>> -=-=-=-=-=-=
> 


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#48663): https://edk2.groups.io/g/devel/message/48663
Mute This Topic: https://groups.io/mt/34441228/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-




More information about the edk2-devel-archive mailing list