[edk2-devel] [PATCH] UefiPayloadPkg: Always split page table entry to 4K if it covers stack.

Ni, Ray ray.ni at intel.com
Tue May 31 13:24:56 UTC 2022


> > I am not quite sure how Linux handles such case?
> 
> Oh, lovely.  CPU bugs lurking indeed.  linux has this longish comment
> (see mm/huge_memory.c, in the middle of the __split_huge_pmd_locked()
> function):
> 
>         /*
>          * Up to this point the pmd is present and huge and userland has the
>          * whole access to the hugepage during the split (which happens in
>          * place). If we overwrite the pmd with the not-huge version pointing
>          * to the pte here (which of course we could if all CPUs were bug
>          * free), userland could trigger a small page size TLB miss on the
>          * small sized TLB while the hugepage TLB entry is still established in
>          * the huge TLB. Some CPU doesn't like that.
>          * See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
>          * 383 on page 105. Intel should be safe but is also warns that it's
>          * only safe if the permission and cache attributes of the two entries
>          * loaded in the two TLB is identical (which should be the case here).
>          * But it is generally safer to never allow small and huge TLB entries
>          * for the same virtual address to be loaded simultaneously. So instead
>          * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
>          * current pmd notpresent (atomically because here the pmd_trans_huge
>          * must remain set at all times on the pmd until the split is complete
>          * for this pmd), then we flush the SMP TLB and finally we write the
>          * non-huge version of the pmd entry with pmd_populate.
>          */
> 
> So linux goes 2M -> not present -> 4K instead of direct 2M -> 4K (and
> does the tlb flush in the not present state), which apparently is needed
> on some CPUs to avoid confusing the tlb cache.
> 
> > Before that's fully understood, we think the page table split for
> > stack does no harm to the functionality and code complexity. That's
> > why we choose this fix first.
> 
> So this basically splits the page right from the start instead of doing
> it later when page attributes are changed.  Which probably avoids the
> huge page landing in the tlb cache, which in turn avoids triggering the
> issues outlined above.

yes:) Actually there is no split at all. The 4K page table is created in the very beginning(before setting to cr3).
So, no TLB cache issue at all.

> 
> I think doing a linux-style page split will be the more robust solution.

Thanks for explaining the linux behavior.

Intel's SDM also contain below wordings:
* As noted in Section 4.10.2, the TLBs may subsequently contain multiple translations for the address range if
* software modifies the paging structures so that the page size used for a 4-KByte range of linear addresses
* changes. A reference to a linear address in the address range may use any of these translations.
* Software wishing to prevent this uncertainty should not write to a paging-structure entry in a way that would
* change, for any linear address, both the page size and either the page frame, access rights, or other attributes.
* It can instead use the following algorithm: first clear the P flag in the relevant paging-structure entry (e.g.,
* PDE); then invalidate any translations for the affected linear addresses (see above); and then modify the
* relevant paging-structure entry to set the P flag and establish modified translation(s) for the new page size.

But I still have some doubts about using linux-style page split.
Because it's marked as not present:
1. Active code should not access data in the 2M region (stack is in the 2M region in our case)
2. Active code should not in the 2M region (how to guarantee that?)

How does Linux guarantee the above two points?

Thanks,
Ray


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#90103): https://edk2.groups.io/g/devel/message/90103
Mute This Topic: https://groups.io/mt/91446026/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-




More information about the edk2-devel-archive mailing list