[edk2-devel] [PATCH 1/1] ArmPkg/ExceptionSupport: Support backtrace through an exception

Laszlo Ersek lersek at redhat.com
Wed Aug 30 13:31:50 UTC 2023


On 8/30/23 15:00, Ard Biesheuvel wrote:
> On Tue, 29 Aug 2023 at 16:37, Laszlo Ersek <lersek at redhat.com> wrote:
>>
>> On 8/29/23 15:29, Ard Biesheuvel wrote:
>>> Laszlo reports that the efi_gdb.py script fails to produce a full
>>> backtrace when attaching it to an ARM firmware build that has halted on
>>> an unhandled exception.
>>>
>>> The reason is that the asm code that processes the exception was not
>>> implemented with this in mind, and therefore lacks any handling of it.
>>>
>>> So let's add this: create a dummy frame record suitable for chasing the
>>> frame pointer, and add the CFI metadata to describe where the return
>>> value can be found on the stack.
>>>
>>> When using a GCC5 build, this produces a stack trace such as
>>>
>>>   (gdb) bt
>>>   #0  0x000000007fd4537c in CpuDeadLoop () at /home/ardb/build/edk2/MdePkg/Library/BaseLib/CpuDeadLoop.c:30
>>>   #1  0x000000007fd454f8 in DebugAssert (
>>>       FileName=FileName at entry=0x7fd4a8a8 <MmioWrite64Internal+3604> "/home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c",
>>>       LineNumber=LineNumber at entry=343, Description=Description at entry=0x7fd4a896 <MmioWrite64Internal+3586> "((BOOLEAN)(0==1))")
>>>       at /home/ardb/build/edk2/MdePkg/Library/BaseDebugLibSerialPort/DebugLib.c:235
>>>   #2  0x000000007fd479ec in DefaultExceptionHandler (ExceptionType=<optimized out>, SystemContext=...)
>>>       at /home/ardb/build/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c:343
>>>   #3  0x000000007fd48eb8 in ExceptionHandlersEnd ()
>>>   #4  0x000000007fcde944 in QemuLoadKernelImage (ImageHandle=<synthetic pointer>) at /home/ardb/build/edk2/OvmfPkg/Library/GenericQemuLoadImageLib/GenericQemuLoadImageLib.c:201
>>>   #5  TryRunningQemuKernel () at /home/ardb/build/edk2/ArmVirtPkg/Library/PlatformBootManagerLib/QemuKernel.c:46
>>>   #6  PlatformBootManagerAfterConsole () at /home/ardb/build/edk2/ArmVirtPkg/Library/PlatformBootManagerLib/PlatformBm.c:1139
>>>   #7  BdsEntry (This=<optimized out>) at /home/ardb/build/edk2/MdeModulePkg/Universal/BdsDxe/BdsEntry.c:931
>>>   #8  0x000000007ffd0018 in ?? ()
>>>   Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>>>
>>> when QemuLoadKernelImage() has been tweaked to trigger an exception, as
>>> is shown by GDB when walking the call stack:
>>>
>>> |    0x7fcde938 <BdsEntry+3292>      b.ne    0x7fcdf134 <BdsEntry+5336>  // b.any
>>> |    0x7fcde93c <BdsEntry+3296>      mov     x0, #0x40                       // #64
>>> |    0x7fcde940 <BdsEntry+3300>      bl      0x7fcd7aec <DebugPrint>
>>> |  > 0x7fcde944 <BdsEntry+3304>      brk     #0x4d2
>>> |    0x7fcde948 <BdsEntry+3308>      bl      0x7fce4354 <ConnectDevicesFromQemu>
>>> |    0x7fcde94c <BdsEntry+3312>      tbz     x0, #63, 0x7fcde954 <BdsEntry+3320>
>>> |    0x7fcde950 <BdsEntry+3316>      bl      0x7fcd844c <EfiBootManagerConnectAll>
>>> |    0x7fcde954 <BdsEntry+3320>      bl      0x7fcd990c <EfiBootManagerRefreshAllBootOption
>>>
>>> Unfortunately, CLANGDWARF does not seem entirely happy with this
>>> arrangement: it identifies the call frame where the exception
>>> originated, but does not show any frames above that. (This could be
>>> related to the fact that the exception code uses a separate exception
>>> stack for handling synchronous exceptions)
>>
>> First of all, thanks for writing this patch so incredibly quickly. :)
>>
> 
> My pleasure.
> 
>> Second, something must be off with my gdb.
>>
>> Before your patch, I kept experimenting with manually resetting FP, SP,
>> and LR to the values printed in the register dump, using gdb "set"
>> commands. Strangely, that did result in complete pre-exception stack
>> traces, but *only sometimes*. Most of the time gdb complains about
>> "corrupted stack". And I just can't figure out what distinguishes the
>> broken from the functional "bt" commands -- I did walk the allegedly
>> corrupt stack manually, and there is nothing corrupt in the FP and LR
>> parts of the stack frames. They all chain nicely and point to valid
>> instructions, respectively. I don't know what it is that gdb doesn't like.
>>
> 
> I suspect that gdb is filled with heuristics and tweaks, and uses a
> combination of the frame records, the actual value of LR and the
> unwind data to figure out what the call stack looks like.

That's what I feared :/

> 
>> Third, when I test your patch, I seem to experience precisely what you
>> describe under CLANGDWARF -- it shows the faulting frame (the frame just
>> before the exception), but nothing before it! And I'm not building with
>> clang :(
>>
> 
> Shame. Unfortunately, I don't have a lot of time to spend on this
> right now, but it is something I have been wanting to fix forever so
> hopefully I'll get back to it at some point.
> 

I'm grateful that you wrote v1! :)

Thank you!
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#108146): https://edk2.groups.io/g/devel/message/108146
Mute This Topic: https://groups.io/mt/101030910/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/3943202/1813853/130120423/xyzzy [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-




More information about the edk2-devel-archive mailing list