[edk2-devel] [PATCH v2 2/4] OvmfPkg/PlatformInitLib: Add PlatformGetLowMemoryCB

Laszlo Ersek lersek at redhat.com
Wed Jan 11 13:56:05 UTC 2023


On 1/11/23 08:29, Gerd Hoffmann wrote:

>> (9) BTW, still regarding commit 2a0bd3bffc80 ("OvmfPkg/PlatformInitLib:
>> q35 mtrr setup fix", 2022-09-28) -- does the following code comment
>> still apply?
>>
>>     //
>>     // Set the memory range from the start of the 32-bit MMIO area (32-bit PCI
>>     // MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable.
>>     //
>>
>> Because, in case the new branch introduced by commit 2a0bd3bffc80 takes
>> effect (namely, "Uc32Base = BASE_2GB"), I'm not sure where the PCIEXBAR
>> starts, and then the above comment may no longer hold.
>
> PCIEXBAR location doesn't change, it's still at 0xb0000000.

>>> @@ -158,11 +157,11 @@ PlatformMemMapInitialization (
>>>      // the base of the 32-bit PCI host aperture.
>>>      //
>>>      PciExBarBase = PcdGet64 (PcdPciExpressBaseAddress);
>>> -    ASSERT (TopOfLowRam <= PciExBarBase);
>>> +    ASSERT (PlatformInfoHob->LowMemory <= PciExBarBase);
>>>      ASSERT (PciExBarBase <= MAX_UINT32 - SIZE_256MB);
>>>      PciBase = (UINT32)(PciExBarBase + SIZE_256MB);
>>
>> This change looks OK, but, similarly to my question (9):
>>
>> (11) Is the following comment still up to date:
>>
>>     //
>>     // The MMCONFIG area is expected to fall between the top of low RAM and
>>     // the base of the 32-bit PCI host aperture.
>>     //
>>
>> with regard to the new branch introduced by commit 2a0bd3bffc80
>> ("OvmfPkg/PlatformInitLib: q35 mtrr setup fix", 2022-09-28)?
>
> root at fedora ~# cat /proc/iomem
> [ ... ]
> 7ebfe000-7effffff : System RAM
> 7f000000-7fffffff : Reserved
> 80000000-afffffff : PCI Bus 0000:00
> b0000000-bfffffff : PCI MMCONFIG 0000 [bus 00-ff]
>   b0000000-bfffffff : Reserved
>     b0000000-bfffffff : pnp 00:04
> c0000000-febfffff : PCI Bus 0000:00
> [ ... ]
> root at fedora ~# cat /proc/mtrr
> reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
> reg01: base=0x800000000 (32768MB), size=32768MB, count=1: uncachable

Ugh, what? :)

I was about to point out a contradiction between (a) the following from
commit 2a0bd3bffc80:

+      // Newer qemu with gigabyte aligned memory,
+      // 32-bit pci mmio window is 2G -> 4G then.

and (b) your confirmation that the PCIEXBAR location does not change.
Namely, I was about to point out that PCIEXBAR -- *config space*
expressed as MMIO -- would then overlap the 32-bit MMIO aperture, the
one that's assignable to BARs as MMIO space.

But then your /proc/iomem quote actually confirms this is what happens
in practice -- and apparently works??? In Linux anyways?

FWIW I don't see how this is safe with regard to the firmware. Even if
QEMU is capable of generating a set of discontiguous resource
descriptors in the DSDT / _CRS, and Linux is capable of dealing with
that, I don't understand how the firmware does it.

Has it only been working by chance, perhaps? PciHostBridgeDxe uses this
intersection-based MMIO addition that we discussed in the BZ, so I
certainly don't expect a conflict there; after all, the MMIO space used
for MMCONFIG and the MMIO space used for BAR assignments should mostly
be the same; IOW there should be no conflict or compatibility problem at
the GCD level that PciHostBridgeDxe chokes on.

But when the actual BARs are assigned (allocated), what prevents them
from being allocated from the overlapping MMCONFIG "sub" interval? I
think there's a problem there, we just have not hit it yet.

OK, OK, let me review what the code actually does. Here's the relevant
part of the call tree:

    InitializePlatform()                   [OvmfPkg/PlatformPei/Platform.c]
[1]   PlatformQemuUc32BaseInitialization() [OvmfPkg/Library/PlatformInitLib/MemDetect.c]
      InitializeRamRegions()               [OvmfPkg/PlatformPei/MemDetect.c]
[2]     PlatformQemuInitializeRam()        [OvmfPkg/Library/PlatformInitLib/MemDetect.c]
      MemMapInitialization()               [OvmfPkg/PlatformPei/Platform.c]
[3]     PlatformMemMapInitialization()     [OvmfPkg/Library/PlatformInitLib/Platform.c]
      MiscInitialization()                 [OvmfPkg/PlatformPei/Platform.c]
        PlatformMiscInitialization()       [OvmfPkg/Library/PlatformInitLib/Platform.c]
[4]       PciExBarInitialization()         [OvmfPkg/Library/PlatformInitLib/Platform.c]

[5] AmdSevDxeEntryPoint()                  [OvmfPkg/AmdSevDxe/AmdSevDxe.c]

- At [1], the most recent state is from commit 2a0bd3bffc80
  ("OvmfPkg/PlatformInitLib: q35 mtrr setup fix", 2022-09-28). What this
  commit does is, it lowers (conditionally) the base of the area that
  we'll mark as UC ("Uc32Base"), from 0xB0000000 (2816 MB, the start of
  the EXBAR) to 2048 MB.

  It does not change the location of MMCONFIG *or* the 32-bit MMIO
  aperture. It only adds a *comment* like this:

  +      // Newer qemu with gigabyte aligned memory,
  +      // 32-bit pci mmio window is 2G -> 4G then.

  which is a *natural language statement* about the 32-bit MMIO
  aperture, but no code change.

- At [2], we mark the area [Uc32Base, 4GB) as uncacheable in the MTRRs.
  There is a comment saying:

    //
    // Set the memory range from the start of the 32-bit MMIO area (32-bit PCI
    // MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable.
    //

  And this comment *conflicts* with the one from [1]. Because on Q35,
  the new Uc32Base (2GB) may not actually match the PCIEXBAR start (2816
  MB). Therefore commit 2a0bd3bffc80 made the comment at [2] stale. This
  is issue {1}.

  Still, this is not a functional error, as we're only marking a
  *larger* (and simpler-to-express) area as UC than before. When this
  new logic fires, we have *nothing* in the space between the 32-bit RAM
  and the EXBAR base -- thus far, anyway!.

- At [3], we have a comment saying

    //
    // The MMCONFIG area is expected to fall between the top of low RAM and
    // the base of the 32-bit PCI host aperture.
    //

  plus code that actually *implements* this. In spite of the *comment*
  added in commit 2a0bd3bffc80, at [1], the logic at [3] *still* --
  correctly -- places the 32-bit MMIO aperture *above* the PCIEXBAR.

  This is what PciHostBridgeDxe will expose as aperture, and what
  PciBusDxe will assign BARs from. No conflict is possible with the
  MMCONFIG area.

  This is in fact confirmed by the following log snippet, when booting a
  pc-q35-7.2 guest with 1792 MB of RAM:

> PciHostBridgeUtilityInitRootBridge: populated root bus 0, with room for 255 subordinate bus(es)
> RootBridge: PciRoot(0x0)
>   Support/Attr: 70069 / 70069
>     DmaAbove4G: No
> NoExtConfSpace: No
>      AllocAttr: 3 (CombineMemPMem Mem64Decode)
>            Bus: 0 - FF Translation=0
>             Io: 6000 - FFFF Translation=0
>            Mem: C0000000 - FBFFFFFF Translation=0
>     MemAbove4G: 800000000 - FFFFFFFFF Translation=0
>           PMem: FFFFFFFFFFFFFFFF - 0 Translation=0
>    PMemAbove4G: FFFFFFFFFFFFFFFF - 0 Translation=0

  Note "Mem: C0000000 - FBFFFFFF Translation=0".

- Still at [3], we cover the MMCONFIG area with a reserved memory HOB
  (not MMIO HOB). This is alright; and again the reason we have no
  conflict in PciHostBridgeDxe is that the 32-bit MMIO aperture *still*
  starts above the EXBAR; it does not "surround" it.

- At [4], we actually program the EXBAR.

- At [5], early in the DXE phase, knowing that we marked the EXBAR as
  reserved memory and not as MMIO, we explicitly decrypt (when on SEV)
  the EXBAR.

OK, so what do we learn from the above?

- issue {1}: the statement in [2] PlatformQemuInitializeRam(), in the
  following comment:

    //
    // Set the memory range from the start of the 32-bit MMIO area (32-bit PCI
    // MMIO aperture on i440fx, PCIEXBAR on q35) to 4GB as uncacheable.
    //

   was broken by commit 2a0bd3bffc80.

- issue {2}: the statement from commit 2a0bd3bffc80

  +      // 32-bit pci mmio window is 2G -> 4G then.

  is not factual, as far as the firmware is concerned. I don't know
  *how* Linux ends up extending the MMIO aperture so far down that it
  streddles / embeds MMCONFIG, but AFAICT it has nothing to do with the
  firmware.

  Therefore, I also don't understand where the requirement comes (from
  Linux? where?) that the firmware mark the "gap" between 2048 MB and
  2816 MB as uncached. The firmware does not use it for anything, so why
  does the Linux kernel do? And if the Linux kernel does, then why does
  it not reprogram the MTRRs as well?

  The commit message from commit 2a0bd3bffc80 states, "Which effectively
  makes the 32-bit pci mmio window start at 0x80000000".

  I'm precisely after that "effectively" adverb here: placing the 32-bit
  MMIO aperture at 2048 MB is not *at all* what the firmware does.

... OK, I think I've found it! It's the following QEMU commit:
4a4418369d6d ("q35: fix mmconfig and PCI0._CRS", 2019-06-16).

I've even found the original discussion:

- v1: http://mid.mail-archive.com/20190528204331.5280-1-kraxel@redhat.com
- v2: http://mid.mail-archive.com/20190607073429.3436-1-kraxel@redhat.com

So, this seems to happen by QEMU moving the hole as low as possible in
the \SB.PCI0 _CRS, in the DSDT, punching gaps as neeed. And Linux
adheres to that.

I've tested it with said pc-q35-7.2 guest with 1792 MB of RAM, running
RHEL-7.9 (that's what I've had handy). The MTRR update from edk2 commit
2a0bd3bffc80 takes effect:

> [    0.000000] MTRR variable ranges enabled:
> [    0.000000]   0 base 0080000000 mask FF80000000 uncachable

*but* the start of the 32-bit MMIO range (which is discontiguous) appears
even lower, per QEMU commit 4a4418369d6d:

> [    0.000000] e820: [mem 0x70000000-0xafffffff] available for PCI devices
> ...
> [    0.281120] pci_bus 0000:00: root bus resource [mem 0x70000000-0xafffffff window]
> ...
> [    0.529741] pci 0000:00:02.0: BAR 6: assigned [mem 0x70000000-0x7000ffff pref]
> ...
> [    0.565903] pci_bus 0000:00: resource 7 [mem 0x70000000-0xafffffff window]

and

> 70000000-afffffff : PCI Bus 0000:00
>   70000000-7000ffff : 0000:00:02.0
> b0000000-bfffffff : PCI MMCONFIG 0000 [bus 00-ff]
>   b0000000-bfffffff : reserved
>     b0000000-bfffffff : pnp 00:04
> c0000000-febfffff : PCI Bus 0000:00
>   c0000000-c0ffffff : 0000:00:02.0
>     c0000000-c0ffffff : bochs-drm

There is *no sign* of 0x70000000 in the firmware log. So Linux
effectively ressigns some PCI resources, and utilizes the
(discontiguous) area that QEMU exposes in the DSDT, with the firmware
knowing nothing about it.

Note, again, that our UC settings in the MTRR start *higher*, at 2GB.

So not only am I unconvinced (or at least confused!) that edk2 commit
2a0bd3bffc80 was necessary, it even *seems* insufficient, for
UC-coverage of the 32-bit MMIO aperture.

In practice though, it doesn't seem to cause issues! (Which again
questions if the original change in 2a0bd3bffc80 was really necessary.)

I've filed a new TianoCore BZ about clarifying the comments please:

  https://bugzilla.tianocore.org/show_bug.cgi?id=4289

> With gigabyte-alignment being the common case these days it might make
> sense to place the MMCONFIG area at 0xe0000000 instead ...

I feel really unsafe about complicating this code even further.

Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#98295): https://edk2.groups.io/g/devel/message/98295
Mute This Topic: https://groups.io/mt/96173191/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-




More information about the edk2-devel-archive mailing list