[edk2-devel] [PATCH v6 7/9] OvmfPkg/CpuHotplugSmm: add CpuEject()

Laszlo Ersek lersek at redhat.com
Wed Feb 3 20:58:50 UTC 2021


On 02/03/21 07:45, Ankur Arora wrote:
> On 2021-02-02 6:15 a.m., Laszlo Ersek wrote:
>> On 02/02/21 15:00, Laszlo Ersek wrote:
>>
>>> ... I guess that volatile-qualifying both CPU_HOT_EJECT_DATA, and the
>>> array pointed-to by CPU_HOT_EJECT_DATA.ApicIdMap, should suffice. In
>>> combination with the sync-up point that you quoted. This seems to match
>>> existing practice in PiSmmCpuDxeSmm -- there are no concurrent accesses,
>>> so atomicity is not a concern, and serializing the instruction streams
>>> coarsely, with the sync-up, in combination with volatile accesses,
>>> should presumably guarantee visibility (on x86 anyway).
>>
>> To summarize, this is what I would ask for:
>>
>> - make CPU_HOT_EJECT_DATA volatile
>>
>> - make (*CPU_HOT_EJECT_DATA.ApicIdMap) volatile
>>
>> - after storing something to CPU_HOT_EJECT_DATA or
>> CPU_HOT_EJECT_DATA.ApicIdMap on the BSP, execute a MemoryFence()
>>
>> - before fetching something from CPU_HOT_EJECT_DATA or
>> CPU_HOT_EJECT_DATA.ApicIdMap on an AP, execute a MemoryFence()
>>
>>
>> Except: MemoryFence() isn't a *memory fence* in fact.
>>
>> See "MdePkg/Library/BaseLib/X64/GccInline.c".
>>
>> It's just a compiler barrier, which may not add anything beyond what
>> we'd already have from "volatile".
>>
>> Case in point: PiSmmCpuDxeSmm performs heavy multi-processing, but does
>> not contain a single invocation of MemoryFence(). It uses volatile
>> objects, and a handful of InterlockedCompareExchangeXx() calls, for
>> implementing semaphores. (NB: there is no 8-bit variant of
>> InterlockedCompareExchange(), as "volatile UINT8" is considered atomic
>> in itself, and a suitable basis for a sempahore too.) And given the
>> synchronization from those semaphores, PiSmmCpuDpxeSmm trusts that
>> updates to the *other* volatile objects are both atomic and visible.
>>
>> I'm pretty sure this only works because x86 is in-order. There are
>> instruction stream barriers in place, and compiler barriers too, but no
>> actual memory barriers.
> 
> Right and just to separate them explicitly, there are two problems:
> 
>  - compiler: where the compiler caches stuff in or looks at stale memory
> locations. Now, AFAICS, this should not happen because the ApicIdMap would
> never change once set so the compiler should reasonably be able to cache
> the address of ApicIdMap and dereference it (thus obviating the need for
> volatile.)

(CPU_HOT_EJECT_DATA.Handler does change though.)

> The compiler could, however, cache any assignments to ApicIdMap[Idx]
> (especially given LTO) and we could use a MemoryFence() (as the compiler
> barrier that it is) to force the store.
> 
>  - CPU pipeline: as you said, right now we basically depend on x86 store
> order semantics (and the CpuPause() loop around AllCpusInSync, kinda
> provides
> a barrier.)
> 
> So the BSP writes in this order:
> ApicIdMap[Idx]=x; ... ->AllCpusInSync = true
> 
> And whenever the AP sees ->AllCpusInSync == True, it has to now see
> ApicIdMap[Idx] == x.

Yes.

> 
> Maybe the thing to do is to detail this barrier in a commit note/comment?

That would be nice.

> And add the MemoryFence() but not the volatile.

Yes, that should work.

Thanks,
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#71116): https://edk2.groups.io/g/devel/message/71116
Mute This Topic: https://groups.io/mt/80199926/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-





More information about the edk2-devel-archive mailing list