[edk2-devel] [PATCH v6 8/9] OvmfPkg/CpuHotplugSmm: add worker to do CPU ejection

Wed Feb 3 05:41:03 UTC 2021

On 2021-02-02 5:23 a.m., Laszlo Ersek wrote:
> On 02/01/21 20:21, Ankur Arora wrote:
>> On 2021-02-01 9:22 a.m., Laszlo Ersek wrote:
>>> On 01/29/21 01:59, Ankur Arora wrote:
>>>> Designate a worker CPU (we use the one executing the root MMI
>>>> handler), which will do the actual ejection via QEMU in CpuEject().
>>>>
>>>> CpuEject(), on the worker CPU, ejects each marked CPU by first
>>>> selecting its APIC ID and then sending the QEMU "eject" command.
>>>> QEMU in-turn signals the remote VCPU thread which context-switches
>>>> it out of the SMI.
>>>>
>>>> CpuEject(), on the CPU being ejected, spins around in its holding
>>>> area until this final context-switch. This does mean that there is
>>>> some CPU state that would ordinarily be restored (in SmiRendezvous()
>>>> and in SmiEntry.nasm::CommonHandler), but will not be anymore.
>>>> This unrestored state includes FPU state, CET enable, stuffing of
>>>> RSB and the final RSM. Since the CPU state is destroyed by QEMU,
>>>> this should be okay.
>>>>
>>>> Cc: Laszlo Ersek <lersek at redhat.com>
>>>> Cc: Jordan Justen <jordan.l.justen at intel.com>
>>>> Cc: Ard Biesheuvel <ard.biesheuvel at arm.com>
>>>> Cc: Igor Mammedov <imammedo at redhat.com>
>>>> Cc: Boris Ostrovsky <boris.ostrovsky at oracle.com>
>>>> Cc: Aaron Young <aaron.young at oracle.com>
>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
>>>> Signed-off-by: Ankur Arora <ankur.a.arora at oracle.com>
>>>> ---
>>>>    OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 73
>>>> ++++++++++++++++++++++++++++++++++----
>>>>    1 file changed, 67 insertions(+), 6 deletions(-)
>>>
>>> (1) s/CpuEject/EjectCpu/g, per previous request (affects commit message
>>> and code too).
>>>
>>>
>>>> diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> index 526f51faf070..bf91344eef9c 100644
>>>> --- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> +++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
>>>> @@ -193,9 +193,12 @@ RevokeNewSlot:
>>>>      CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit(),
>>>>      on each CPU at exit from SMM.
>>>>
>>>> -  If, the executing CPU is not being ejected, nothing to be done.
>>>> +  If, the executing CPU is neither a worker, nor being ejected, nothing
>>>> +  to be done.
>>>>      If, the executing CPU is being ejected, wait in a CpuDeadLoop()
>>>>      until ejected.
>>>> +  If, the executing CPU is a worker CPU, set QEMU CPU status to eject
>>>> +  for CPUs being ejected.
>>>>
>>>>      @param[in] ProcessorNum      Index of executing CPU.
>>>>
>>>> @@ -217,6 +220,56 @@ CpuEject (
>>>>        return;
>>>>      }
>>>>
>>>> +  if (ApicId == CPU_EJECT_WORKER) {
>>>
>>> (2) The CPU_EJECT_WORKER approach is needlessly complicated (speculative
>>> generality). I wish I understood this idea earlier in the patch set.
>>>
>>> (2a) In patch #5 (subject "OvmfPkg/CpuHotplugSmm: define
>>> CPU_HOT_EJECT_DATA"), the CPU_EJECT_WORKER macro definition should be
>>> dropped.
>>>
>>> (2b) In this patch, the question whether the executing CPU is the BSP or
>>> not, should be decided with the same logic that is visible in
>>> PlatformSmmBspElection()
>>> [OvmfPkg/Library/SmmCpuPlatformHookLibQemu/SmmCpuPlatformHookLibQemu.c]:
>>>
>>>     MSR_IA32_APIC_BASE_REGISTER ApicBaseMsr;
>>>     BOOLEAN                     IsBsp;
>>>
>>>     ApicBaseMsr.Uint64 = AsmReadMsr64 (MSR_IA32_APIC_BASE);
>>>     IsBsp = (BOOLEAN)(ApicBaseMsr.Bits.BSP == 1);
>>>
>>> (2c) Point (2b) obviates the explicit "mark as worker" logic entirely,
>>> in UnplugCpus() below.
>>>
>>> (2d) The "is worker" language (in comments etc) should be replaced with
>>> direct "is BSP" language.
>>>
>>>
>>>> +    UINT32 CpuIndex;
>>>> +
>>>> +    for (CpuIndex = 0; CpuIndex < mCpuHotEjectData->ArrayLength;
>>>> CpuIndex++) {
>>>> +      UINT64 RemoveApicId;
>>>> +
>>>> +      RemoveApicId = mCpuHotEjectData->ApicIdMap[CpuIndex];
>>>> +
>>>> +      if ((RemoveApicId != CPU_EJECT_INVALID &&
>>>> +           RemoveApicId != CPU_EJECT_WORKER)) {
>>>> +        //
>>>> +        // This to-be-ejected-CPU has already received the BSP's SMI
>>>> exit
>>>> +        // signal and, will execute SmmCpuFeaturesSmiRendezvousExit()
>>>> +        // followed by this callback or is already waiting in the
>>>> +        // CpuDeadLoop() below.
>>>> +        //
>>>> +        // Tell QEMU to context-switch it out.
>>>> +        //
>>>> +        QemuCpuhpWriteCpuSelector (mMmCpuIo, (APIC_ID) RemoveApicId);
>>>> +        QemuCpuhpWriteCpuStatus (mMmCpuIo, QEMU_CPUHP_STAT_EJECTED);
>>>
>>> (3) While the QEMU CPU selector value *usually* matches the APIC ID,
>>> it's not an invariant. APIC IDs have an internal structure, composed of
>>> bit-fields, where each bit-field accommodates one hierarchy level in the
>>> CPU topology (thread, core, die (maybe), and socket).
>>>
>>> However, this mapping need not be surjective. QEMU lets you create
>>> "pathological" CPU topologies, for example one with:
>>> - 3 threads/core,
>>> - 5 cores/socket,
>>> - (say) 2 sockets.
>>>
>>> Under that example, the bit-field standing for the "thread number" level
>>> would have 2 bits, theoretically permitting *4* threads/core, and the
>>> bit-field standing for the "core number" level would have 3 bits,
>>> theoretically allowing for *8* cores/socket.
>>>
>>> Considering the fully populated topology, you'd see the CPU selector
>>> range from 0 to (3*5*2-1)=29, inclusive (corresponding to 30 logical
>>> processors in total). However, the APIC ID *image* of this CPU selector
>>> *domain* would not be "contiguous" -- the APIC ID space, with the
>>> above-described structure, would accommodate 4*8*2=64 logical
>>> processors. For example, each APIC ID that stood for the nonexistent
>>> "thread#3" on a particular core would be left unused (no CPU selector
>>> would map to it).
>>>
>>> All in all, you can't write the APIC ID to the CPU selector register,
>>> for ejection. You need to select the CPU whose APIC ID is the APIC ID
>>> you want to eject, and then initiate ejection.
>>
>> Yeah, this is a clear bug. Should have seen it earlier. Thanks for
>> pointing it out.
>>
>>>
>>> This requires one of two alternatives:
>>>
>>>
>>> (3a) The first option is to keep the change local to this patch.
>>>
>>> This alternative is the more CPU-hungry (and uglier) one.
>>>
>>> The idea is to perform a QEMU_CPUHP_CMD_GET_ARCH_ID loop over all
>>> possible CPUs, somewhat similarly to QemuCpuhpCollectApicIds(). At every
>>> CPU, knowing the APIC ID, try to find the APIC ID in "ApicIdMap". If
>>> there is a match, eject.
>>>
>>>
>>> (3b) The second option is much more elegant (and it's faster too), but
>>> it requires a much more intrusive update to the patch set.
>>>
>>> First, the *element type* of the arrays that QemuCpuhpCollectApicIds()
>>> operates on, has to be changed from APIC_ID to a structure type that
>>> pairs APIC_ID with the QEMU CPU selector. [*]
>>>
>>> Second, whenever QemuCpuhpCollectApicIds() outputs an APIC_ID, it should
>>> also save the "CurrentSelector" value (in the other field of the output
>>> array element structure).
>>>
>>> Third, the element type of CPU_HOT_EJECT_DATA.ApicIdMap should be
>>> replaced with a structure type similar (or identical) to the one
>>> described at [*]. The ProcessorNumber lookup in UnplugCpus() would still
>>> be based upon the APIC ID, but CPU_HOT_EJECT_DATA should remember both
>>> the QEMU selector for that processor, and the APIC ID.
>>>
>>> Fourth, the actual ejection should use the selector.
>>>
>>> Fifth, the debug message (below) should continue logging the APIC ID, to
>>> mirror the DEBUG_INFO message in ProcessHotAddedCpus().
>>>
>>>
>>> Would you be willing to implement (3b)?
>>
>> 3b is clearly the better solution. However, is there enough value in
>> the print message containing APIC ID, that CPU_HOT_EJECT_DATA.ApicIdMap
>> carry both the cpu-selector and APIC ID?
>>
>> As you say, the ejection itself just needs the ProcessorNum -> QEMU
>> cpu-selector mapping.
> 
> Good question, and I'm torn.
> 
> The default DEBUG build of OVMF enables INFO messages, and more severe
> ones. It does not enable VERBOSE messages.
> 
> In such a DEBUG build (i.e., with otherwise default settings), the log
> entries that relate to hot-plug do not report selectors. Conversely, on
> hot-unplug, the log would report selectors *only* (not APIC IDs). I feel
> this makes it more difficult to get useful OVMF debug logs in bug
> reports, especially from users of distro-provided OVMF builds.
> 
> If we can ask the user to rebuild with VERBOSE enabled and re-run the
> test, that's always great, but frequently users don't know how to
> rebuild OVMF, plus usually OVMF is hidden so deep in their virt stack
> that they have no idea how to make that stack use a custom OVMF build.
> 
> ... How about this: in UnplugCpus(), we could emit an INFO message about
> the ProcessorNumber <-> APIC ID <-> Selector correspondence, and when
> the eject happens, it would suffice to log (at INFO level) only
> ProcessorNumber <-> Selector.

This makes sense to me. Will do.

Thanks
Ankur

> 
> Thanks,
> Laszlo
> 

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71092): https://edk2.groups.io/g/devel/message/71092
Mute This Topic: https://groups.io/mt/80199930/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-