[edk2-devel] [PATCH v8 09/10] OvmfPkg/CpuHotplugSmm: do actual CPU hot-eject
Ankur Arora
ankur.a.arora at oracle.com
Wed Feb 24 03:44:53 UTC 2021
On 2021-02-23 1:39 p.m., Laszlo Ersek wrote:
> On 02/22/21 08:19, Ankur Arora wrote:
>> Add logic in EjectCpu() to do the actual the CPU ejection.
>>
>> On the BSP, ejection happens by first selecting the CPU via
>> its QemuSelector and then sending the QEMU "eject" command.
>> QEMU in-turn signals the remote VCPU thread which context-switches
>> the CPU out of the SMI handler.
>>
>> Meanwhile the CPU being ejected, waits around in its holding
>> area until it is context-switched out. Note that it is possible
>> that a slow CPU gets ejected before it reaches the wait loop.
>> However, this would never happen before it has executed the
>> "AllCpusInSync" loop in SmiRendezvous().
>> It can mean that an ejected CPU does not execute code after
>> that point but given that the CPU state will be destroyed by
>> QEMU, the missed cleanup is no great loss.
>>
>> Cc: Laszlo Ersek <lersek at redhat.com>
>> Cc: Jordan Justen <jordan.l.justen at intel.com>
>> Cc: Ard Biesheuvel <ard.biesheuvel at arm.com>
>> Cc: Igor Mammedov <imammedo at redhat.com>
>> Cc: Boris Ostrovsky <boris.ostrovsky at oracle.com>
>> Cc: Aaron Young <aaron.young at oracle.com>
>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
>> Signed-off-by: Ankur Arora <ankur.a.arora at oracle.com>
>> ---
>>
>> Notes:
>> Addresses the following reviewing comments from v6:
>> (1) s/CpuEject/EjectCpu/g
>> (2,2a,2c) Get rid of eject-worker and related.
>> (2b,2d) Use the PlatformSmmBspElection() logic to find out IsBSP.
>> (3,3b) Use CPU_HOT_EJECT_DATA->QemuSelector instead of ApicIdMap to
>> do the actual ejection.
>> (4,5a,5b) Fix the format etc in the final unplugged log message
>> () Also as discussed elsewhere document the ordering requirements for
>> mCpuHotEjectData->QemuSelector[] and mCpuHotEjectData->Handler.
>> () [from patch 2] Move definition of QEMU_CPUHP_STAT_EJECTED to this
>> patch.
>> () s/QEMU_CPUHP_STAT_EJECTED/QEMU_CPUHP_STAT_EJECT/
>>
>> OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h | 1 +
>> OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 127 +++++++++++++++++++--
>> .../Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c | 31 +++++
>> 3 files changed, 152 insertions(+), 7 deletions(-)
>>
>> diff --git a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
>> index 2ec7a107a64d..d0e83102c13f 100644
>> --- a/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
>> +++ b/OvmfPkg/Include/IndustryStandard/QemuCpuHotplug.h
>> @@ -34,6 +34,7 @@
>> #define QEMU_CPUHP_STAT_ENABLED BIT0
>> #define QEMU_CPUHP_STAT_INSERT BIT1
>> #define QEMU_CPUHP_STAT_REMOVE BIT2
>> +#define QEMU_CPUHP_STAT_EJECT BIT3
>> #define QEMU_CPUHP_STAT_FW_REMOVE BIT4
>>
>> #define QEMU_CPUHP_RW_CMD_DATA 0x8
>> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> index 851e2b28aad9..0484be8fe43c 100644
>> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>> @@ -18,6 +18,7 @@
>> #include <Pcd/CpuHotEjectData.h> // CPU_HOT_EJECT_DATA
>> #include <Protocol/MmCpuIo.h> // EFI_MM_CPU_IO_PROTOCOL
>> #include <Protocol/SmmCpuService.h> // EFI_SMM_CPU_SERVICE_PROTOCOL
>> +#include <Register/Intel/ArchitecturalMsr.h> // MSR_IA32_APIC_BASE_REGISTER
>> #include <Uefi/UefiBaseType.h> // EFI_STATUS
>>
>> #include "ApicId.h" // APIC_ID
>> @@ -191,12 +192,39 @@ RevokeNewSlot:
>> }
>>
>> /**
>> + EjectCpu needs to know the BSP at SMI exit at a point when
>> + some of the EFI_SMM_CPU_SERVICE_PROTOCOL state has been torn
>> + down.
>> + Reuse the logic from OvmfPkg::PlatformSmmBspElection() to
>> + do that.
>> +
>> + @param[in] ProcessorNum ProcessorNum denotes the processor handle number
>> + in EFI_SMM_CPU_SERVICE_PROTOCOL.
>> +**/
>> +STATIC
>> +BOOLEAN
>> +CheckIfBsp (
>> + IN UINTN ProcessorNum
>> + )
>
> (1a) Please remove the ProcessorNum parameter -- comment and parameter
> list alike --; it's useless. In the parameter list, just replace the
> line's contents with "VOID".
Heh. Yeah, I'm not certain why I added that.
> (1b) Additionally, please state:
>
> @retval TRUE If the CPU executing this function is the BSP.
>
> @retval FALSE If the CPU executing this function is an AP.
>
Will add.
>
>
>
>> +{
>> + MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
>> + BOOLEAN IsBsp;
>
> (2) "IsBsp" should line up with "ApicBaseMsr".
>
>
>> +
>> + ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
>> + IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
>> + return IsBsp;
>> +}
>> +
>> +/**
>> CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit()
>> on each CPU at exit from SMM.
>>
>> - If, the executing CPU is not being ejected, nothing to be done.
>> + If, the executing CPU is neither the BSP, nor being ejected, nothing
>> + to be done.
>> If, the executing CPU is being ejected, wait in a halted loop
>> until ejected.
>> + If, the executing CPU is the BSP, set QEMU CPU status to eject
>> + for CPUs being ejected.
>>
>> @param[in] ProcessorNum ProcessorNum denotes the CPU exiting SMM,
>> and will be used as an index into
>> @@ -211,9 +239,99 @@ EjectCpu (
>> )
>> {
>> UINT64 QemuSelector;
>> + BOOLEAN IsBsp;
>>
>> + IsBsp = CheckIfBsp (ProcessorNum);
>> +
>> + //
>> + // mCpuHotEjectData->QemuSelectorMap[ProcessorNum] is updated
>> + // on the BSP in the ongoing SMI iteration at two places:
>
> (3) I feel "iteration" doesn't really apply; I'd simply drop that word.
> "ongoing SMI" seems sufficient (or maybe "ongoing SMI handling").
Yeah that reads better.
>
>
>> + //
>> + // - UnplugCpus() where the BSP determines if a CPU is under ejection
>> + // or not. As the comment where mCpuHotEjectData->Handler is set-up
>> + // describes any such updates are guaranteed to be ordered-before the
>> + // dereference below.
>> + //
>> + // - EjectCpu() on the BSP updates QemuSelectorMap[ProcessorNum] for
>> + // CPUs after they have been hot-ejected.
>> + //
>> + // The CPU under ejection: might be executing anywhere between the
>> + // "AllCpusInSync" exit loop in SmiRendezvous() to about to
>> + // dereference QemuSelectorMap[ProcessorNum].
>> + // Given that the BSP ensures that this store only happens after the
>> + // CPU has been ejected, this CPU would never see the after value.
>> + // (Note that any CPU that is already executing the CpuSleep() loop
>> + // below never raced any updates and always saw the before value.)
>> + //
>> + // CPUs not-under ejection: never see any changes so they are fine.
>> + //
>> + // Lastly, note that we are also guaranteed that any dereferencing
>> + // CPU only sees the before or after value and not an intermediate
>> + // value. This is because QemuSelectorMap[ProcessorNum] is aligned at
>> + // a natural boundary.
>> + //
>
> (4) Well, about the last paragraph -- when I saw that you used UINT64 in
> QemuSelectorMap, to allow for a sentinel value, I immediately thought of
> IA32. This module is built for IA32 too, and there I'm not so sure about
> the atomicity of UINT64 accesses.
>
> I didn't raise it at that point because I wasn't sure if we were
> actually going to rely on the atomicity. And, I don't think we are.
> Again, let me quote Paolo's example from <https://lwn.net/Articles/844224/>:
>
> thread 1 thread 2
> -------------------------------- ------------------------
> a.x = 1;
> smp_wmb();
> WRITE_ONCE(message, &a); datum = READ_ONCE(message);
> smp_rmb();
> if (datum != NULL)
> printk("%x\n", datum->x);
>
> The access to object "x" is not restricted in any particular way. In
> thread#1, we have an smp_wmb() which enforces both a compiler barrier
> and a store-release fence, and then we have an atomic access to
> "message". But the "a.x" assignment need not be atomic.
I concur. I think I conflated atomic access to "message" with also
needing atomic access for the variables we need to ensure transitivity
on.
As you say below, this means that a bunch of this logic (and comments)
can be simplified.
> In our case, "a.x = 1" maps to (a) populating "QemuSelectorMap", and (b)
> setting up the "Handler" function pointer, in UnplugCpus(). The
> smp_wmb() is the "ReleaseMemoryFence" in UnplugCpus(). The WRITE_ONCE is
> the final "AllCpusInSync = FALSE" assignment in BSPHandler()
> [UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c].
>
> Regarding thread#2, the READ_ONCE is matched by the "AllCpusInSync"
> loop. smp_rmb() is the "AcquireMemoryFence" I called for, under PATCH v8
> 07/10, "OvmfPkg/SmmCpuFeaturesLib: call CPU hot-eject handler". And the
> "datum" access happens (a) in SmmCpuFeaturesRendezvousExit(), where we
> fetch/call Handler, and (b) here, where we read/modify "QemuSelectorMap".
>
> So this seems to imply we can:
>
> - drop the natural-alignment padding of "QemuSelectorMap", from PATCH v8
> 06/10, "OvmfPkg/SmmCpuFeaturesLib: init CPU ejection state",
>
> - drop the last paragraph of the comment above.
Will do.
>
>
>> QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
>> - if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
>> + if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID && !IsBsp) {
>> + return;
>> + }
>> +
>> + if (IsBsp) {
>
> (5) Can you reorder the high-level flow here? Such as:
>
> if (CheckIfBsp ()) {
> //
> // send the eject requests to QEMU here
> //
> return;
> }
>
> //
> // Reached only on APs:
> //
> QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
> if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
> return;
> }
>
> //
> // AP being hot-ejected; pen it here.
> //
Yeah, this is clearer.
>
>
>> + UINT32 Idx;
>> +
>> + for (Idx = 0; Idx < mCpuHotEjectData->ArrayLength; Idx++) {
>> + UINT64 QemuSelector;
>> +
>> + QemuSelector = mCpuHotEjectData->QemuSelectorMap[Idx];
>> +
>> + if (QemuSelector != CPU_EJECT_QEMU_SELECTOR_INVALID) {
>> + //
>> + // This to-be-ejected-CPU has already received the BSP's SMI exit
>> + // signal and, will execute SmmCpuFeaturesRendezvousExit()
>> + // followed by this callback or is already waiting in the
>> + // CpuSleep() loop below.
>> + //
>> + // Tell QEMU to context-switch it out.
>> + //
>> + QemuCpuhpWriteCpuSelector (mMmCpuIo, (UINT32) QemuSelector);
>> + QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECT);
>> +
>> + //
>> + // We need a compiler barrier here to ensure that the compiler
>> + // does not reorder the CpuStatus and QemuSelectorMap[Idx] stores.
>> + //
>> + // A store fence is not strictly necessary on x86 which has
>> + // TSO; however, both of these stores are in different address spaces
>> + // so also add a Store Fence here.
>> + //
>> + MemoryFence ();
>
> (6) I wonder if this compiler barrier + comment block are helpful.
> Paraphrasing your (ex-)colleague Liran, if MMIO and IO Port accessors
> didn't contain built-in fences, all hell would break lose. We're using
> EFI_MM_CPU_IO_PROTOCOL for IO Port accesses. I think we should be safe
> ordering-wise, even without an explicit compiler barrier here.
>
> To me personally, this particular fence only muddies the picture --
> where we already have an acquire memory fence and a store memory fence
> to couple with each other.
>
> I'd recommend removing this. (If you disagree, I'm willing to listen to
> arguments, of course!)
You are right that we don't need a memory fence here -- given that there
is an implicit fence due to the MMIO.
As for the compiler fence, I'm just now re-looking at handlers in
EFI_MM_CPU_IO_PROTOCOL and they do seem to include a compiler barrier.
So I agree with you that we have all the fences that we need. However,
I do think it's a good idea to document both of these here.
>
>> +
>> + //
>> + // Clear the eject status for this CPU Idx to ensure that an invalid
>> + // SMI later does not end up trying to eject it or a newly
>> + // hotplugged CPU Idx does not go into the dead loop.
>> + //
>> + mCpuHotEjectData->QemuSelectorMap[Idx] =
>> + CPU_EJECT_QEMU_SELECTOR_INVALID;
>> +
>> + DEBUG ((DEBUG_INFO, "%a: Unplugged ProcessorNum %u, "
>> + "QemuSelector 0x%Lx\n", __FUNCTION__, Idx, QemuSelector));
>
> (7) Please log QemuSelector with "%Lu".
Ack.
>
>
>> + }
>> + }
>> +
>> + //
>> + // We are done until the next hot-unplug; clear the handler.
>> + //
>> + // By virtue of the MemoryFence() in the ejection loop above, the
>> + // following store is ordered-after all the ejections are done.
>> + // (We know that there is at least one CPU hot-eject handler if this
>> + // handler was installed.)
>> + //
>> + // As described in OvmfPkg::SmmCpuFeaturesRendezvousExit() this
>> + // means that the only CPUs which might dereference
>> + // mCpuHotEjectData->Handler are not under ejection, so we can
>> + // safely reset.
>> + //
>> + mCpuHotEjectData->Handler = NULL;
>> return;
>> }
>>
>> @@ -496,11 +614,6 @@ CpuHotplugMmi (
>> if (EFI_ERROR (Status)) {
>> goto Fatal;
>> }
>> - if (ToUnplugCount > 0) {
>> - DEBUG ((DEBUG_ERROR, "%a: hot-unplug is not supported yet\n",
>> - __FUNCTION__));
>> - goto Fatal;
>> - }
>>
>> if (PluggedCount > 0) {
>> Status = ProcessHotAddedCpus (mPluggedApicIds, PluggedCount);
>> diff --git a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>> index 99988285b6a2..ddfef05ee6cf 100644
>> --- a/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>> +++ b/OvmfPkg/Library/SmmCpuFeaturesLib/SmmCpuFeaturesLib.c
>> @@ -472,6 +472,37 @@ SmmCpuFeaturesRendezvousExit (
>> // (PcdCpuMaxLogicalProcessorNumber > 1), and hot-eject is needed
>> // in this SMI exit (otherwise mCpuHotEjectData->Handler is not armed.)
>> //
>> + // mCpuHotEjectData itself is stable once setup so it can be
>> + // dereferenced without needing any synchronization,
>> + // but, mCpuHotEjectData->Handler is updated on the BSP in the
>> + // ongoing SMI iteration at two places:
>> + //
>> + // - UnplugCpus() where the BSP determines if a CPU is under ejection
>> + // or not. As the comment where mCpuHotEjectData->Handler is set-up
>> + // describes any such updates are guaranteed to be ordered-before the
>> + // dereference below.
>> + //
>> + // - EjectCpu() (which is called via the Handler below), on the BSP
>> + // updates mCpuHotEjectData->Handler once it is done with all ejections.
>> + //
>> + // The CPU under ejection: might be executing anywhere between the
>> + // "AllCpusInSync" exit loop in SmiRendezvous() to about to
>> + // dereference the Handler field.
>> + // Given that the BSP ensures that this store only happens after all
>> + // CPUs under ejection have been ejected, this CPU would never see
>> + // the after value.
>> + // (Note that any CPU that is already executing the CpuSleep() loop
>> + // below never raced any updates and always saw the before value.)
>> + //
>> + // CPUs not-under ejection: might see either value of the Handler
>> + // which is fine, because the Handler is a NOP for CPUs not-under
>> + // ejection.
>> + //
>> + // Lastly, note that we are also guaranteed that any dereferencing
>> + // CPU only sees the before or after value and not an intermediate
>> + // value. This is because mCpuHotEjectData->Handler is aligned at a
>> + // natural boundary.
>> + //
>>
>> if (mCpuHotEjectData != NULL) {
>> CPU_HOT_EJECT_HANDLER Handler;
>>
>
> (8) I can't really put my finger on it, I just feel that repeating
> (open-coding) this wall of text here is not really productive.
Part of the reason I wanted to document this here was to get your
opinion on it and figure out how much of it is useful and how
much might be overkill.
>
> Do you think that, after you add the "acquire memory fence" comment in
> patch #7, we could avoid most of the text here? I think we should only
> point out (in patch #7) the "release fence" that the logic here pairs with.>
> If you really want to present it all from both perspectives, I guess I'm
> OK with that, but then I believe we should drop the last paragraph at
> least (see point (4)).
Rereading it after a gap of a few days and given that most of this is
just a repeat, I'm also tending towards overkill. I think a comment
talking about acquire/release pairing is useful. Rest of it can probably
be met with just a pointer towards the comment in EjectCpus(). Does that
make sense?
Thanks
Ankur
>
> Thanks
> Laszlo
>
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#72144): https://edk2.groups.io/g/devel/message/72144
Mute This Topic: https://groups.io/mt/80819864/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-
More information about the edk2-devel-archive
mailing list