[edk2-devel] [Patch 0/2] UefiCpuPkg: Default avoid print.

Thu Aug 1 20:20:25 UTC 2019

On 7/31/19 1:58 PM, Laszlo Ersek wrote:
> On 07/31/19 18:34, Brian J. Johnson wrote:
> 
>> I do wonder if there would be a clean way to let a DebugLib instance
>> itself declare that AP_DEBUG() is safe.  That way a platform would
>> only need to override the DebugLib instance in the DSC file, rather
>> than both the instance and the PCD.  (I know, I'm nitpicking.)  A
>> library can't override PCDs in its calling modules, of course.  I
>> suppose the AP_DEBUG() macro could call a new DebugLib entry point to
>> test for AP safety before doing anything else, say
>> DebugPrintOnApIsSafe().  Or it could even be a global CONST BOOLEAN
>> defined by the library.  But that would require all DebugLib instances
>> to change, which is something you were trying to avoid.
> 
> That's right -- I tried to imagine some other approaches, but they'd
> need new DebugLib functions, and likely force platforms to write new
> code.
> 
> 
>> However, it's not always practical to track down all uses of DEBUG().
>> An AP can easily call a library routine which uses DEBUG() rather than
>> AP_DEBUG(), buried under several layers of transitive library
>> dependencies.  In other words, it's not always practical to determine
>> ahead of time if a given DEBUG() call may be done on an AP.
> 
> This problem is valid IMO, but I think its scope is a lot wider than
> just DebugLib. Assume the programmer is looking at a function that may
> be invoked on an AP, and they are about to call a function for taking
> care of a specific sub-task. If the programmer cannot prove the
> thread-safety of the *entire* call tree underneath the function call
> they are about to add, they simply must not add the call. The
> thread-safety of the DebugLib instance in use is just a part of the
> thread-safety of said call tree.
> 
> Put differently, code that runs APs must be extremely self-contained;
> I'd rule out any and all lib classes from direct use unless a specific
> library instance, advertizing thread safety, would be chosen in the
> platform DSC file. But, if we adopted this approach, we could even
> introduce a new AP-oriented library *class* for debug messages, offer a
> Null implementation in edk2, and ask platforms to bring their own.
> 
> 
>> I know that AP code runs in a very restricted environment and that
>> people who use MpServices are supposed to understand the
>> repercussions, but it gets very difficult when libraries are
>> involved.  :(
> 
> Exactly -- the first restriction people should understand is, "stay away
> from libraries as much as you can".
> 
> 
>> So would a better solution be to modify the common unsafe DebugLib
>> instances to have DebugPrintEnabled() return FALSE on APs?  That would
>> probably require a new BaseLib interface to determine if the caller is
>> running on the BSP or an AP.
> 
> I agree that "AmIAnAP()" would be a pre-requisite.
> 
> 
>> (For IA32/X64 this isn't too hard -- it just needs to check a bit in
>> the local APIC.
> 
> Still not trivial, as some DebugLib instances might want to target
> runtime drivers (or even SMM drivers). For runtime drivers the
> complication is that a runtime (virtual address) mapping for the LAPIC
> MMIO range would be needed (if I understand correctly anyway). And for
> both runtime and SMM drivers, it could be a problem that on physical
> hardware, the MMIO range of the LAPIC can be moved (reprogrammed) to a
> different base address, possibly by the OS too.
> 
> I could be quite confused about this, of course; I don't eat LAPICs for
> breakfast :) I just recall an SMM firmware vulnerability that was in
> part based on moving the LAPIC base address elsewhere. Hm... googling
> suggests the attack was called "The Memory Sinkhole".
> 

I finally looked at the documentation (imagine!), and the BSP flag is 
bit 8 in the IA32_APIC_BASE (0x1B) MSR.  So no local APIC access is 
necessary, only reading an MSR.  This MSR has been present since the 
Pentium Pro (family/model 06_01H), and is considered "architectural."

> 
>> I have no idea about other architectures.)  That wouldn't solve the
>> problem everywhere -- anyone using a custom DebugLib would have to
>> update it themselves.  But it would solve it solidly in the majority
>> of cases.
>>
>> Thoughts?
> 
> My fear might not be reasonable, but I feel quite uncomfortable about
> LAPIC accesses in DebugLib APIs. The information ("BSP or AP") is safer
> to determine at the call site, I think, even if it takes more human
> work.
> 

I understand the concern.  DebugLib is exposed in a lot of awkward end 
cases and odd environments.  Caution is warranted.  But checking a bit 
in a MSR is about as simple a test as you can do.  It sounds reasonable 
to me.  Actually reading the local APIC would, as you said, be a lot 
more problematic.

The observation remains that the problem is a lot bigger than DebugLib. 
We can't add AmIAnAP() calls on every single library interface.  Are 
there any other spots people know of which tend to cause a lot of grief 
for APs?  Maybe the memory allocation boot services?  At least if 
there's a simple test for AP-ness which doesn't have additional 
dependencies, it's easy to add checks as needed.

I don't know if this discussion will go anywhere... but it has been an 
interesting one to have.

> I could very well be biased. In OvmfPkg we have a minuscule amount of
> code that runs on APs, and even that code is written with total
> minimalism in mind.
> 
> Leaping to a different topic... Years ago I was tracking down an MTRR
> setup bug in the Xen hypervisor (as it was shipped as a part of RHEL5).
> It is necessary to setup MTRRs identically on all CPUs, plus it has to
> be done while all CPUs are in a "pen" doing nothing but setting up
> MTRRs. The bug was exactly in that part of the code (running
> simultaneously on more than a hundred CPUs). It was impossible to print
> anything to the serial console -- first because it would be unreadable
> for humans, and second because the delays would perturb the buggy
> behavior. In the end I had to introduce an array of per-CPU debug
> structures where each CPU would record its own view (snapshot) of a
> shared resource -- a shared resource that should have been protected by
> mutual exclusion between the CPUs. After the CPUs left the "pen" (with
> the invalid MTRR configuration established), I'd use the BSP to dump the
> array. That showed me that some CPUs had overlapping / inconsistent
> views of the shared resource between each other. This proved that the
> mutual exclusion primitive didn't work as expected -- it turns out that
> the semaphore (or spinlock, not sure) in question used an INT8 counter,
> which overflowed when more than 127 CPUs contended for the resource.
> 

I too have done battle with MTRR initialization, with over a thousand 
processors.  I feel your pain.

> I'm sure I'm misremembering parts of this story (from several years
> distance), the moral is that debugging in a multiprocessing environment
> may easily require its own dedicated infrastructure. In edk2, we don't
> have anything like that, I think. Could we build it, sufficiently
> generally? Like, prepare a log buffer for each CPU before calling
> StartupAllAps(), log only to that buffer during the concurrent
> execution, and finally dump the buffers? I guess if we don't *reach* the
> "finally" part,  we could still dump the RAM and investigate the log
> buffers that way... Dumping the RAM is certainly an option for virtual
> machines, but it might be viable for physical setups too (JTAG, debug
> agent... dunno).
> 
> Sorry about the wild speculation :)
> 
> Thanks
> Laszlo
> 

-- 

                                                 Brian

--------------------------------------------------------------------

   "Are we going to push the edge of the envelope, Brain?"
   "No, Pinky, but we may get to the sticky part."
                                            -- Quoted on the Net

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#44805): https://edk2.groups.io/g/devel/message/44805
Mute This Topic: https://groups.io/mt/32664465/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-