[edk2-devel] [PATCH v6 7/9] OvmfPkg/CpuHotplugSmm: add CpuEject()

Laszlo Ersek lersek at redhat.com
Mon Feb 1 16:11:24 UTC 2021


On 01/29/21 01:59, Ankur Arora wrote:
> Add CpuEject(), which handles the CPU ejection, and provides a holding
> area for said CPUs. It is called via SmmCpuFeaturesRendezvousExit(),
> at the tail end of the SMI handling.

(1) The functions introduced thus far by this patch series are all named
"Verb + Object", which is great; so please call this function EjectCpu()
as well, rather than CpuEject().

Modify all three of: subject line, commit message, patch body; please.


>
> Also UnplugCpus() now stashes APIC IDs of CPUs which need to be
> ejected in CPU_HOT_EJECT_DATA.ApicIdMap. These are used by CpuEject()
> to identify such CPUs.
>
> Cc: Laszlo Ersek <lersek at redhat.com>
> Cc: Jordan Justen <jordan.l.justen at intel.com>
> Cc: Ard Biesheuvel <ard.biesheuvel at arm.com>
> Cc: Igor Mammedov <imammedo at redhat.com>
> Cc: Boris Ostrovsky <boris.ostrovsky at oracle.com>
> Cc: Aaron Young <aaron.young at oracle.com>
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
> Signed-off-by: Ankur Arora <ankur.a.arora at oracle.com>
> ---
>  OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 109 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 105 insertions(+), 4 deletions(-)
>
> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> index 70d69f6ed65b..526f51faf070 100644
> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
> @@ -14,6 +14,7 @@
>  #include <Library/MmServicesTableLib.h>      // gMmst
>  #include <Library/PcdLib.h>                  // PcdGetBool()
>  #include <Library/SafeIntLib.h>              // SafeUintnSub()
> +#include <Library/CpuHotEjectData.h>         // CPU_HOT_EJECT_DATA
>  #include <Protocol/MmCpuIo.h>                // EFI_MM_CPU_IO_PROTOCOL
>  #include <Protocol/SmmCpuService.h>          // EFI_SMM_CPU_SERVICE_PROTOCOL
>  #include <Uefi/UefiBaseType.h>               // EFI_STATUS

(2) This will change due to the movement of the header file, but: please
keep the #include directive list alphabetically sorted.


> @@ -32,11 +33,12 @@ STATIC EFI_MM_CPU_IO_PROTOCOL *mMmCpuIo;
>  //
>  STATIC EFI_SMM_CPU_SERVICE_PROTOCOL *mMmCpuService;
>  //
> -// This structure is a communication side-channel between the
> +// These structures serve as communication side-channels between the
>  // EFI_SMM_CPU_SERVICE_PROTOCOL consumer (i.e., this driver) and provider
>  // (i.e., PiSmmCpuDxeSmm).
>  //
>  STATIC CPU_HOT_PLUG_DATA *mCpuHotPlugData;
> +STATIC CPU_HOT_EJECT_DATA *mCpuHotEjectData;
>  //
>  // SMRAM arrays for fetching the APIC IDs of processors with pending events (of
>  // known event types), for the time of just one MMI.
> @@ -188,11 +190,53 @@ RevokeNewSlot:
>  }
>
>  /**
> +  CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
> +  on each CPU at exit from SMM.
> +
> +  If, the executing CPU is not being ejected, nothing to be done.
> +  If, the executing CPU is being ejected, wait in a CpuDeadLoop()
> +  until ejected.
> +
> +  @param[in] ProcessorNum      Index of executing CPU.
> +
> +**/
> +VOID
> +EFIAPI
> +CpuEject (
> +  IN UINTN ProcessorNum
> +  )
> +{
> +  //
> +  // APIC ID is UINT32, but mCpuHotEjectData->ApicIdMap[] is UINT64
> +  // so use UINT64 throughout.
> +  //
> +  UINT64 ApicId;
> +
> +  ApicId = mCpuHotEjectData->ApicIdMap[ProcessorNum];
> +  if (ApicId == CPU_EJECT_INVALID) {
> +    return;
> +  }
> +
> +  //
> +  // CPU(s) being unplugged get here from SmmCpuFeaturesSmiRendezvousExit()
> +  // after having been cleared to exit the SMI by the monarch and thus have
> +  // no SMM processing remaining.
> +  //
> +  // Given that we cannot allow them to escape to the guest, we pen them
> +  // here until the SMM monarch tells the HW to unplug them.
> +  //
> +  CpuDeadLoop ();
> +}

(3a) We can make this less resource-hungry, by replacing CpuDeadLoop()
with:

  for (;;) {
    DisableInterrupts ();
    CpuSleep ();
  }

This basically translates to a { CLI; HLT; } loop.

(Both functions come from BaseLib, which CpuHotplugSmm already consumes,
thus there is no need to modify #include's or [LibraryClasses].)


(3b) Please refresh the CpuDeadLoop() reference in the function's
leading comment as well.


> +
> +/**
>    Process to be hot-unplugged CPUs, per QemuCpuhpCollectApicIds().
>
>    For each such CPU, report the CPU to PiSmmCpuDxeSmm via
> -  EFI_SMM_CPU_SERVICE_PROTOCOL. If the to be hot-unplugged CPU is
> -  unknown, skip it silently.
> +  EFI_SMM_CPU_SERVICE_PROTOCOL and stash the APIC ID for later ejection.
> +  If the to be hot-unplugged CPU is unknown, skip it silently.
> +
> +  Additonally, if we do stash any APIC IDs, also install a CPU eject handler
> +  which would handle the ejection.
>
>    @param[in] ToUnplugApicIds    The APIC IDs of the CPUs that are about to be
>                                  hot-unplugged.
> @@ -216,9 +260,11 @@ UnplugCpus (
>  {
>    EFI_STATUS Status;
>    UINT32     ToUnplugIdx;
> +  UINT32     EjectCount;
>    UINTN      ProcessorNum;
>
>    ToUnplugIdx = 0;
> +  EjectCount = 0;
>    while (ToUnplugIdx < ToUnplugCount) {
>      APIC_ID    RemoveApicId;
>
> @@ -255,13 +301,41 @@ UnplugCpus (
>        DEBUG ((DEBUG_ERROR, "%a: RemoveProcessor(" FMT_APIC_ID "): %r\n",
>          __FUNCTION__, RemoveApicId, Status));
>        goto Fatal;
> +    } else {

(Under patch v6 4/9, I request that the "goto" be replaced with a
"return" -- my point (4) below applies regardless:)

(4) Please don't add an "else" branch, if the first branch of the "if"
ends with a jump statement. Because, in that case, the code that follows
the "if" statement is not reachable after the first branch anyway.

So please just unnest the next part:


> +      //
> +      // Stash the APIC IDs so we can do the actual ejection later.
> +      //
> +      if (mCpuHotEjectData->ApicIdMap[ProcessorNum] != CPU_EJECT_INVALID) {
> +        //
> +        // Since ProcessorNum and APIC-ID map 1-1, so a valid
> +        // mCpuHotEjectData->ApicIdMap[ProcessorNum] means something
> +        // is horribly wrong.
> +        //

(5) To be honest, I would replace this with:

      //
      // - mCpuHotEjectData->ApicIdMap[ProcessorNum] is initialized to
      //   CPU_EJECT_INVALID when mCpuHotEjectData->ApicIdMap is allocated,
      //
      // - mCpuHotEjectData->ApicIdMap[ProcessorNum] is restored to
      //   CPU_EJECT_INVALID when the subject processor is ejected,
      //
      // - mMmCpuService->RemoveProcessor(ProcessorNum) invalidates
      //   mCpuHotPlugData->ApicId[ProcessorNum], so a given ProcessorNum can
      //   never match more than one APIC ID in a single invocation of
      //   UnplugCpus().
      //


> +        DEBUG ((DEBUG_ERROR, "%a: ProcessorNum %u maps to %llx, cannot "
> +                "map to " FMT_APIC_ID "\n", __FUNCTION__, ProcessorNum,
> +                mCpuHotEjectData->ApicIdMap[ProcessorNum], RemoveApicId));

(6a) The indentation of the 2nd and 3rd lines is incorrect.

(6b) For logging UINTN values (i.e., ProcessorNum) portably between IA32
and X64, %u is not correct. Instead:

- cast the UINTN value to UINT64 explicitly,
- use the %Lu or %Lx format specifier.

(6c) There is no "%llx" format string in edk2's PrintLib (no "ll" length
modifier, to be more precise). UINT64 values need to be printed with
"%lu" or "%lx", or -- identically -- with "%Lu" or "%Lx". I prefer the
latter, because standard C does not define the "L" size modifier for
integers, and that makes it clear that we're using an edk2-specific
feature. The "l" (ell) length modifier could be misunderstood as "long"
(which is something we don't use in edk2).

(6d) FMT_APIC_ID is defined as "0x%08x"; to remain consistent with that,
I would print the ApicIdMap element not just with "%Lx", but with
"0x%016Lx".


> +
> +        Status = EFI_INVALID_PARAMETER;
> +        goto Fatal;

(7a) Please just "return EFI_ALREADY_STARTED".

(7b) Please also modify the leading comment on the function -- the new
return value EFI_ALREADY_STARTED should be documented. I suggest:

   @retval EFI_ALREADY_STARTED   For the ProcessorNumber that
                                 EFI_SMM_CPU_SERVICE_PROTOCOL had assigned to
                                 one of the APIC ID in ToUnplugApicIds,
                                 mCpuHotEjectData->ApicIdMap already has an
                                 APIC ID stashed. (This should never happen.)


> +      }
> +
> +      mCpuHotEjectData->ApicIdMap[ProcessorNum] = (UINT64)RemoveApicId;
> +      EjectCount++;
>      }
>
>      ToUnplugIdx++;
>    }
>
> +  if (EjectCount != 0) {
> +    //
> +    // We have processors to be ejected; install the handler.
> +    //
> +    mCpuHotEjectData->Handler = CpuEject;
> +  }
> +

(8) I suggest removing the "EjectCount" local variable, and setting the
"Handler" member where you currently increment "EjectCount".


>    //
> -  // We've removed this set of APIC IDs from SMM data structures.
> +  // We've removed this set of APIC IDs from SMM data structures and
> +  // have installed an ejection handler if needed.
>    //
>    return EFI_SUCCESS;
>
> @@ -458,7 +532,13 @@ CpuHotplugEntry (
>    // Our DEPEX on EFI_SMM_CPU_SERVICE_PROTOCOL guarantees that PiSmmCpuDxeSmm
>    // has pointed PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM.
>    //
> +  // Additionally, CPU Hot-unplug is available only if CPU Hotplug is, so
> +  // the same DEPEX also guarantees that PcdCpuHotEjectDataAddress points
> +  // to CPU_HOT_EJECT_DATA in SMRAM.
> +  //

(9) I don't see the relevance of "hot-unplug depends on hot-plug" here.

I recommend the following comment instead:

   //
   // Our DEPEX on EFI_SMM_CPU_SERVICE_PROTOCOL guarantees that PiSmmCpuDxeSmm
   // has pointed:
   // - PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM,
   // - PcdCpuHotEjectDataAddress to CPU_HOT_EJECT_DATA in SMRAM, if the
   //   possible CPU count is greater than 1.
   //

>    mCpuHotPlugData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotPlugDataAddress);
> +  mCpuHotEjectData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotEjectDataAddress);
> +
>    if (mCpuHotPlugData == NULL) {
>      Status = EFI_NOT_FOUND;
>      DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_PLUG_DATA: %r\n", __FUNCTION__, Status));
> @@ -470,6 +550,9 @@ CpuHotplugEntry (
>    if (mCpuHotPlugData->ArrayLength == 1) {
>      return EFI_UNSUPPORTED;
>    }
> +  ASSERT (mCpuHotEjectData &&
> +          (mCpuHotPlugData->ArrayLength == mCpuHotEjectData->ArrayLength));
> +
>    //
>    // Allocate the data structures that depend on the possible CPU count.
>    //

(10) To remain consistent with the check performed on "mCpuHotPlugData",
please do:

  if (mCpuHotEjectData == NULL) {
    Status = EFI_NOT_FOUND;
  } else if (mCpuHotPlugData->ArrayLength != mCpuHotEjectData->ArrayLength) {
    Status = EFI_INVALID_PARAMETER;
  } else {
    Status = EFI_SUCCESS;
  }
  if (EFI_ERROR (Status)) {
    DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_EJECT_DATA: %r\n", __FUNCTION__, Status));
    goto Fatal;
  }

(

  As a digression, I'll make some comments on the ASSERT() too:

  - Given ASSERT ((C1) && (C2)), it is best to express the same as
    ASSERT (C1); ASSERT (C2); -- the effect is the same, but the error
    messages have finer granularity.

  - Checking a pointer against NULL must be explicit at all times, in
    edk2. IOW, ASSERT (mCpuHotEjectData) should be spelled
    ASSERT (mCpuHotEjectData != NULL).

)


> @@ -552,6 +635,24 @@ CpuHotplugEntry (
>    //
>    SmbaseInstallFirstSmiHandler ();
>
> +  if (mCpuHotEjectData) {

(11) This condition is guaranteed to evaluate to TRUE; see the ASSERT()
above.

Anyway, ignore this...


> +  UINT32     Idx;

(12) Incorrect indentation, but ignore this too...


> +    //
> +    // For CPU ejection we need to map ProcessorNum -> APIC_ID. By the time
> +    // we need the mapping, however, the Processor's APIC ID has already been
> +    // removed from SMM data structures. So we will maintain a local map
> +    // in mCpuHotEjectData->ApicIdMap.
> +    //
> +    for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) {
> +      mCpuHotEjectData->ApicIdMap[Idx] = CPU_EJECT_INVALID;
> +    }

(13) ... because this init loop should be moved to patch #6 (subject
"OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state"), as I mentioned
there...


> +
> +    //
> +    // Wait to init the handler until an ejection is warranted
> +    //
> +    mCpuHotEjectData->Handler = NULL;

(14) ... and because this nulling is performed by patch #6 already
(subject "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state").


Therefore, this whole conditional block should be removed please.

Thanks!
Laszlo

> +  }
> +
>    return EFI_SUCCESS;
>
>  ReleasePostSmmPen:
>



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71023): https://edk2.groups.io/g/devel/message/71023
Mute This Topic: https://groups.io/mt/80199926/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-





More information about the edk2-devel-archive mailing list