[edk2-devel] [PATCH v2 1/1] OvmfPkg/NestedInterruptTplLib: replace ASSERT() with a warning logged.

Laszlo Ersek lersek at redhat.com
Mon May 8 06:45:52 UTC 2023


On 5/6/23 01:57, Ard Biesheuvel wrote:
> On Sat, 6 May 2023 at 01:27, Michael Brown <mcb30 at ipxe.org> wrote:
>>
>> On 05/05/2023 19:56, Laszlo Ersek wrote:
>>> I don't like the patch. For two reasons:
>>>
>>> (1) It papers over the actual issue. The problem should be fixed where
>>> it is, if possible.
>>
>> Agreed, but (as you have shown in
>> https://bugzilla.redhat.com/show_bug.cgi?id=2189136) the bug lies in
>> Windows code rather than in EDK2 code.  If the goal is to allow these
>> buggy Windows builds to still be used with OVMF, then the only option is
>> to paper over the issue.  We should do this only if it can be proven
>> safe to do so, of course.
>>
>>> (2) With the patch applied, NestedInterruptRaiseTPL() can return
>>> TPL_HIGH_LEVEL (as "InterruptedTPL"). Consequently,
>>> TimerInterruptHandler() [OvmfPkg/LocalApicTimerDxe/LocalApicTimerDxe.c]
>>> may pass TPL_HIGH_LEVEL back to NestedInterruptRestoreTPL(), as
>>> "InterruptedTPL".
>>>
>>> I believe that this in turn may invalidate at least one comment in
>>> NestedInterruptRestoreTPL():
>>>
>>>      //
>>>      // Call RestoreTPL() to allow event notifications to be
>>>      // dispatched.  This will implicitly re-enable interrupts.
>>>      //
>>>      gBS->RestoreTPL (InterruptedTPL);
>>>
>>> Restoring TPL_HIGH_LEVEL does not re-enable interrupts -- nominally anyways.
>>
>> I agree that the comment is invalidated, but as far as I can tell the
>> logic remains safe.
>>
>> I will put together a patch to update the comments in
>> NestedInterruptTplLib to address the possibility of an interrupt
>> occurring (illegally) at TPL_HIGH_LEVEL.
>>
>>> (a) Make LocalApicTimerDxe Xen-specific again. It's only the OVMF Xen
>>> platform that really *needs* NestedInterruptTplLib. (Don't get me wrong:
>>> NestedInterruptTplLib is technically correct in all circumstances, but
>>> in practice it happens to be too strict.)
>>>
>>> (b) For the non-Xen OVMF platforms, re-create a LocalApicTimerDxe
>>> variant that effectively has commits a086f4a63bc0 and a24fbd606125
>>> reverted. (We should keep 9bf473da4c1d.) This returns us to
>>> pre-239b50a86370 status -- that is, a timer interrupt handler that (a)
>>> does not try to be smart about nested interrupts, therefore one that is
>>> much simpler, and (b) is more tolerant of the Windows / cdboot.efi spec
>>> violation, (c) is vulnerable to the timer interrupt storm seen on Xen,
>>> but will never run on Xen. (Only the OVMF Xen platform is supposed to be
>>> launched on Xen.)
>>
>> I'm less keen on this because it reduces the runtime exposure of a very
>> complex piece of code, and will effectively cause that code to become
>> unmaintained.
>>
>> It's also satisfying (to me) that NestedInterruptTplLib provides a
>> provable upper bound on stack consumption due to interrupts, which can't
>> be guaranteed by the simpler pre-239b50a86370 scheme.
>>
>> Could we defer judgement until after I've fully reasoned through (and
>> documented) how NestedInterruptTplLib will work in the presence of
>> interrupts occurring at TPL_HIGH_LEVEL?
>>
> 
> Would it be feasible for our firmware implementation to disable the
> timer interrupt at the timer end as well?
> 
> E.g.,
> 
> RaiseTPL(HIGH)::
> 
> CLI
> disarm timer
> 
> 
> RestoreTPL::
> 
> <complain if HIGH and interrupts enabled at CPU side>
> re-arm timer
> STI
> 

I can be entirely wrong here, but:

- we looked for a solution (or workaround) to the original problem that
stays within the boundaries of OvmfPkg, so sinking tweaks into the core
TPL manipulation functions isn't ideal

- regarding the TimerInterruptHandler() function(s) that do live in
OvmfPkg, there had been tweaks to signaling end-of-interrupt (which I
understand as sort of equivalent to your suggestion, as unless/until you
signal EOI, no more interrupts will be *generated*), but those had not
helped. The EOI was either too early and so we got the unbounded
nesting, or it was too late, and no interrupts were generated while (for
example) TPL_CALLBACK code would depend on timers with CheckEvent. See
bug 4162 -- that was what prompted Michael to revert the EOI placement
tweak and to implement NestedInterruptLib.

Apologies if there are further interpretations of disarming the timer
that I'm missing!

Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#104252): https://edk2.groups.io/g/devel/message/104252
Mute This Topic: https://groups.io/mt/98656860/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/leave/3943202/1813853/130120423/xyzzy [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-




More information about the edk2-devel-archive mailing list