[Crash-utility] increase __PHYSICAL_MASK_SHIFT_XEN?

Petr Tesarik ptesarik at suse.cz
Fri Jan 22 07:48:55 UTC 2021


Hi Jiri,

I'm not an expert on Xen internals, but there's quite likely no better
person to answer. So, here we go...

On Thu, 21 Jan 2021 17:18:29 +0100
Jiri Bohac <jbohac at suse.cz> wrote:

> Hi,
> 
> I've just come across a situation where crash failed to open a
> dump generated by a 4.12 XEN PV dom0 kernel, terminating with
> this message:
> 
> crash: read error: physical address: ffffffffffffffff  type: "pud page"
> 
> The problem is a failed machine-to-physical translation.
> xen_m2p() returns an error (-1UL) and x86_64_pud_offset() than
> uses that value as a physical address.
> 
> I debugged the problem by running crash inside gdb. The backtrace was:
>  #0  xen_m2p (machine=973135872) at kernel.c:9714
>  #1  0x000000000053ed4a in x86_64_pud_offset (pgd_pte=3299508019303, vaddr=18446744072642223552, verbose=0, IS_XEN=1)
>      at x86_64.c:1889
>  #2  0x0000000000540c8a in x86_64_kvtop_xen_wpt (tc=0x0, kvaddr=18446744072642223552, paddr=0x7fffffffdb30, verbose=0)
>      at x86_64.c:2523
>  #3  0x00000000005407d0 in x86_64_kvtop (tc=0x0, kvaddr=18446744072642223552, paddr=0x7fffffffdb30, verbose=0)
>      at x86_64.c:2413
>  #4  0x0000000000491a97 in kvtop (tc=0x0, kvaddr=18446744072642223552, paddr=0x7fffffffdb30, verbose=0) at memory.c:3062
>  #5  0x000000000048f3f0 in readmem (addr=18446744072642223552, memtype=1, buffer=0xba63a0 <shared_bufs>, size=832,
>      type=0x92e1f2 "module struct", error_handle=6) at memory.c:2314
>  #6  0x00000000005071e2 in module_init () at kernel.c:3699
>  ....
> 
> I tracked the problem to a wrong value of
> __PHYSICAL_MASK_SHIFT_XEN. The current value of 40 does not
> correspond to the current kernel value of 52 since kernel commit
> 6f0e8bf16730a36ff6773802d8c8df56d10e60cd (xen: support 52 bit
> physical addresses in pv guests).
> 
> The result is visible in the above backtrace:
> x86_64_pud_offset() is called with pgd_pte=0x3003a00e067 and that
> value is wrongly masked by "pud_paddr = pgd_pte &
> PHYSICAL_PAGE_MASK" and passed to xen_m2p() as 0x3a00e000 instead
> of 0x3003a00e000, causing the m2p translation to fail.

Thank you for this detailed analysis! AFAICS it is totally correct.

> Setting __PHYSICAL_MASK_SHIFT_XEN to 52 fixes the problem with
> this dump.
> 
> But I am not confident it's a safe change. My understanding is
> that it should be safe, as all the unused bits of the physical
> address inside the PTEs should be 0 and thus having the mask wider
> than necessary should not hurt. But I am suspicious if my
> reasoning is correct. Why does crash go into such trouble
> differentiating between different kernels and sets
> machdep->machspec->physical_mask_shift dynamically to one of
> __PHYSICAL_MASK_SHIFT_XEN (40), __PHYSICAL_MASK_SHIFT_2_6 (46),
> and __PHYSICAL_MASK_SHIFT_5LEVEL (52)? Would something break if
> it were always set to 52? The commit adding the logic is
> 307e7f35.

I have asked myself the same question when writing libkdumpfile. ;-)
FWIW libkdumpfile does not go into this trouble, using 52 for all
kernels, and it has been tested on many current and historical crash
dumps with no issues. So, I believe it _is_ in fact safe to use 52
unconditionally as an architectural limit.

IIUC the only reason for having a physical mask at all is that all page
table bits beyond the current architectural limit are marked as
reserved by Intel, so they could theoretically be used for any purpose
in a future revision of the architecture. Intel has always recommended
to initialize reserved bits to zero, but I think that crash utility
developers are quite conservative and afraid of introducing regressions
for cases that are known to work, so whenever they introduce a new
feature (such as 5-level paging), they try to preserve the existing
working cases with absolutely no change.

My two cents,
Petr T
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: Digit??ln?? podpis OpenPGP
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20210122/35ee4efb/attachment.sig>


More information about the Crash-utility mailing list