[edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Bret Barkelew via groups.io bret.barkelew=microsoft.com at groups.io
Mon Oct 12 22:14:29 UTC 2020


Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.

It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney at intel.com>
Sent: Monday, October 12, 2020 9:44 AM
To: Bret Barkelew<mailto:Bret.Barkelew at microsoft.com>; Andrew Fish<mailto:afish at apple.com>; edk2-devel-groups-io<mailto:devel at edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney at intel.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Bret,

How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.

Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?

Mike

From: Bret Barkelew <Bret.Barkelew at microsoft.com>
Sent: Wednesday, October 7, 2020 9:39 AM
To: Kinney, Michael D <michael.d.kinney at intel.com>; Andrew Fish <afish at apple.com>; edk2-devel-groups-io <devel at edk2.groups.io>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney at intel.com>
Sent: Wednesday, October 7, 2020 9:21 AM
To: Andrew Fish<mailto:afish at apple.com>; edk2-devel-groups-io<mailto:devel at edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney at intel.com>
Cc: Bret Barkelew<mailto:Bret.Barkelew at microsoft.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish at apple.com<mailto:afish at apple.com>>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel at edk2.groups.io<mailto:devel at edk2.groups.io>>; Kinney, Michael D <michael.d.kinney at intel.com<mailto:michael.d.kinney at intel.com>>
Cc: bret.barkelew at microsoft.com<mailto:bret.barkelew at microsoft.com>
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish

On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney at intel.com<mailto:michael.d.kinney at intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel at edk2.groups.io<mailto:devel at edk2.groups.io> <devel at edk2.groups.io<mailto:devel at edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C3630fdf48d3349fedde408d86ece23e3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381178939940223%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=B%2FSkE6r5bdZUiS4DvdcIiYEV4sATOssRs1CyDKJeLyQ%3D&reserved=0>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel at edk2.groups.io<mailto:devel at edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2 at e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C3630fdf48d3349fedde408d86ece23e3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381178939950217%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YAITWECmthe%2Bc%2F7wYkqCuQTWp0%2FMrYFbe4DLGkRJD7M%3D&reserved=0>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C3630fdf48d3349fedde408d86ece23e3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381178939950217%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jqOpc94XUQPPHpuUFSYRGzr23UTtMBVM4ppdor66rtg%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret






-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66138): https://edk2.groups.io/g/devel/message/66138
Mute This Topic: https://groups.io/mt/77353474/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/edk2-devel-archive/attachments/20201012/1a72ee07/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3EC3C307D51E4C48BFCC86763C0E0E74.png
Type: image/png
Size: 189 bytes
Desc: 3EC3C307D51E4C48BFCC86763C0E0E74.png
URL: <http://listman.redhat.com/archives/edk2-devel-archive/attachments/20201012/1a72ee07/attachment.png>


More information about the edk2-devel-archive mailing list