[edk2-devel] [PATCH V7 36/37] UefiCpuPkg: Setting initial-count register as the last step

Jeff Fan fanjianfeng at byosoft.com.cn
Fri May 20 03:50:56 UTC 2022


Ray,

Could we add one API (like InitializeApicTimerEx) to intialize CPU local APCI with additional parameter to indicate if interrupt is to be enabled or not? Just like HPE's solution. 

Jeff


fanjianfeng at byosoft.com.cn
 
From: Henz, Patrick
Date: 2022-05-20 05:54
To: devel at edk2.groups.io; Lendacky, Thomas; min.m.xu at intel.com; Ni, Ray
CC: Yao, Jiewen; Gerd Hoffmann; Anthony Perard; Julien Grall; Dong, Eric
Subject: Re: [edk2-devel] [PATCH V7 36/37] UefiCpuPkg: Setting initial-count register as the last step
Hi all,
 
We (Hewlett Packard Enterprise) are also running into a race condition due to how InitializeApicTimer initializes the APIC timers, we figured this might be a good place to report our findings. On the occasion we notice that APs get stuck in the timer interrupt handling code after getting woken up by the BSP. It appears that if the CurrentCount timer value provided by the BSP is sufficiently small, the brief amount of time between an AP calling InitializeApicTimer and calling DisableApicTimerInterrupt (see SyncLocalApicTimerSetting as an example) leaves enough room for an APIC timer interrupt to occur. This seems to become more frequent on larger systems with higher processor counts, from what we've gathered the increase in the number of locking sequence invocations appears to be making this condition far more likely to occur. We work on scaled systems with node controllers and we've really only seen this on larger systems, but it seems to us this could feasibly happen on smaller systems too. Our current solution is to add an additional argument to InitializeApicTimer, allowing the caller to specify whether or not APIC timer interrupts are to be enabled for the current thread.
 
Thanks,
Patrick Henz
 
Enterprise X86 Labs
Hewlett Packard Enterprise
patrick.henz at hpe.com
 
-----Original Message-----
From: devel at edk2.groups.io [mailto:devel at edk2.groups.io] On Behalf Of Lendacky, Thomas via groups.io
Sent: Friday, May 13, 2022 5:13 PM
To: devel at edk2.groups.io; min.m.xu at intel.com; Ni, Ray <ray.ni at intel.com>
Cc: Yao, Jiewen <jiewen.yao at intel.com>; Gerd Hoffmann <kraxel at redhat.com>; Anthony Perard <anthony.perard at citrix.com>; Julien Grall <julien at xen.org>; Dong, Eric <eric.dong at intel.com>
Subject: Re: [edk2-devel] [PATCH V7 36/37] UefiCpuPkg: Setting initial-count register as the last step
 
On 5/11/22 19:52, Min Xu via groups.io wrote:
> On May 11, 2022 10:06 PM, Lendacky, Thomas wrote:
>> On 5/10/22 21:00, Xu, Min M wrote:
>>> On May 11, 2022 4:30 AM, Tom Lendacky wrote:
>>>> I'm replying to this patch since I can't find patch V12 46/47 
>>>> anywhere in my email.
>>>>
>>>> I've bisected a regression in the Linux kernel to this patch when 
>>>> an SEV-SNP guest is booted. The following message is issued in the 
>>>> kernel for every AP being brought online:
>>>>
>>>> APIC: Stale IRR:
>>>>
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000
>>>> 00020 ISR:
>>>>
>> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000
>>>> 00000
>>>>
>>>> Possibly a timing issue involving the mode switch with the 
>>>> interrupt unmasked. If I leave the interrupt masked and only 
>>>> un-mask it after the programming of the init-count, then the message goes away.
>>>
>>> Do you mean in InitializeApicTimer, it should follow below steps:
>>> 1. mask LvtTimer. (set LvtTimer.Bits.Mask = 1) 2. Do other stuff, 
>>> including programing the init-count register.
>>> 3. un-mask LvtTimer (set LvtTimer.Bit.Mask = 0)
>>
>> Yes, I believe so. I'm not an expert on the APIC timer, but that 
>> seems reasonable to me.
> I tested this fix in Td guest and it has no side effect.
> I check the Intel SDM (Vol.3A Chap 10.5 Handling Local Interrupts) but it doesn't describe the actual sequence of LvtTimer.Bits.Mask and  programming of init-count register.
> @ Ni, Ray,  What's your thought about it?
 
I guess you can theoretically miss an interrupt if your initial count is expires before you unmask the interrupt, so I think your fix is correct and no changes are needed.
 
I need to double check whether I'm properly resetting the APIC when APs are booted multiple times. Since this only occurs with SNP, I think this is on my end.
 
Thanks,
Tom
 
> 
> Thanks
> Min
> 
> 
> 
> 
> 
 
 
 
 
 
 
 

 
 


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#89912): https://edk2.groups.io/g/devel/message/89912
Mute This Topic: https://groups.io/mt/89446188/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/edk2-devel-archive/attachments/20220520/919efe72/attachment-0001.htm>


More information about the edk2-devel-archive mailing list