[edk2-devel] [PATCH v2 2/6] OvmfPkg/AmdSev: add Grub Firmware Volume Package

Laszlo Ersek lersek at redhat.com
Wed Nov 25 14:01:41 UTC 2020


Adding Ard and Liming; and I see you added Bret already.

Comments below.

On 11/25/20 02:27, James Bottomley wrote:
> On Tue, 2020-11-24 at 15:42 -0800, James Bottomley wrote:
>> On Wed, 2020-11-25 at 00:22 +0100, Laszlo Ersek wrote:
> [...]
>>> There are some others that should be possible to remove (pls. refer
>>> to the rest of that email).
>>
>> Heh, well, I spoke too soon.  Even though the OVMF this produces
>> boots to grub and decrypts an encrypted volume, the kernel boot
>> panics because of something missing in the runtime ... it looks like
>> it's tripping over gRT->SetVariable, so I'm going to have to start
>> putting some stuff back again ...
>
> Actually, this isn't me.  I can't get the vanilla OVMF package to boot
> either.  It looks to be a problem with the variable policy stuff since
> the last known good boot was before they were added.
>
> I've attached the boot log with vanilla OVMF.  There's rather a lot of
> policy commits to try reverting.  What's the best way to debug this?

I encountered the exact same crash yesterday, with an old Fedora 30
virtual machine (running an 5.0.9-based Linux kernel).

I found the crash independently of SEV-ES -- I was simply performing
some regression tests due to the upcoming edk2-stable202011 tag. We're
approaching the release, and I thought this would be about the latest
that I could report issues with the release, or even send patches if
necessary.

So, indeed, because the backtrace below includes
"efi_enter_virtual_mode" and "virt_efi_set_variable_nonblocking", I
immediately thought of the VariablePolicy series. Notably, there is no
crash with the SMM build of OVMF, so I looked at the file

MdeModulePkg/Library/VariablePolicyLib/VariablePolicyExtraInitRuntimeDxe.c

I couldn't see anything wrong with it however (beyond the questionable C
language practice of reinterpreting a pointer-to-function as a
pointer-to-void, but I digress).

Furthermore, ArmVirtQemu also does *not* suffer from the crash, and
ArmVirtQemu uses the same non-SMM variable driver stack as the SMM-less
build of OVMF (modulo the lowest level driver, namely the flash driver).
This was another argument against suspecting the VariablePolicy series.

So my suspicion shifted from the firmware to the guest kernel. I booted
the guest with "efi=noruntime", and then upgraded it to the latest
Fedora 30 packages (dnf upgrade --refresh).

That brought me "kernel-5.6.13-100.fc30" -- which would *still* crash,
when I removed "efi=noruntime". So, as next step, I kept
"efi=noruntime", and upgraded the guest to Fedora 31
<https://docs.fedoraproject.org/en-US/quick-docs/dnf-system-upgrade/>.
Coincidentally, Fedora 31 is now the oldest Fedora release that's still
supported.

This upgrade gave me kernel 5.8.18-100.fc31.x86_64 in the guest -- and
this one does *not* crash. From your boot log below, I see your guest
kernel is 5.5.0; I suggest upgrading it.

So in the end I didn't report the crash on edk2-devel, I decided it was
a Linux kernel bug that got only tickled (unmasked), but not *caused*,
by the VariablePolicy work. I guess the kernel fix in question could be
determined with a reverse bisection, but I don't have time for that now.

For a semi-random git-log command,

$ git log --reverse --oneline v5.6.13..v5.8.18 -- \
    arch/x86/platform/efi/efi.c

14b60cc8e0ea efi/x86: Reindent struct initializer for legibility
a570b0624b3f efi/x86: Replace #ifdefs with IS_ENABLED() checks
50d53c58dd77 efi: Drop handling of 'boot_info' configuration table
120540f230d5 efi/ia64: Move HCDP and MPS table handling into IA64 arch code
fd506e0cf9fd efi: Move UGA and PROP table handling to x86 code
a17e809ea573 efi: Move mem_attr_table out of struct efi
14fb42090943 efi: Merge EFI system table revision and vendor checks
3a0701dc7ff8 efi: Make efi_config_init() x86 only
06c0bd93434c efi: Clean up config_parse_tables()
0a67361dcdaa efi/x86: Remove runtime table address from kexec EFI setup data
9cd437ac0ef4 efi/x86: Make fw_vendor, config_table and runtime sysfs nodes x86 specific
09308012d854 efi/x86: Merge assignments of efi.runtime_version
59f2a619a2db efi: Add 'runtime' pointer to struct efi
fd26830423e5 efi/x86: Drop 'systab' member from struct efi
f10e80a19b07 efi/x86: Add TPM related EFI tables to unencrypted mapping checks
badc61982adb efi/x86: Add RNG seed EFI table to unencrypted mapping check
0e72a6a3cfc3 efi: Export boot-services code and data as debugfs-blobs
f0df68d5bae8 efi: Add embedded peripheral firmware support
3be5f0d286dc Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core
4e9a0f73f030 efi: Clean up config table description arrays

Nothing strikes me (from the subjects) as immediately relevant, but it's
totally possible that the log is too narrow
("arch/x86/platform/efi/efi.c" only).

Ard, do you have an idea which commit in recent Linux history could be
the fix?

(BTW: the ArmVirtQemu guest that I used for testing (successfully!) runs
Fedora 33, meaning its kernel is even more recent:
5.9.9-200.fc33.aarch64.)

I'll make one other comment below:


> Loading Linux 5.5.0-2-amd64 ...
> Loading initial ramdisk ...
> [    0.000000] Linux version 5.5.0-2-amd64 (debian-kernel at lists.debian.org) (gcc version 9.3.0 (Debian 9.3.0-10)) #1 SMP Debian 5.5.17-1 (2020-04-15)
> [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.5.0-2-amd64 root=UUID=8ebba08b-ff72-4030-ba43-8ce252e2e5a4 ro console=ttyS0,115200n8
> [    0.000000] x86/fpu: x87 FPU will use FXSAVE
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000007fffff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000000800000-0x0000000000807fff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x0000000000808000-0x000000000080ffff] usable
> [    0.000000] BIOS-e820: [mem 0x0000000000810000-0x00000000008fffff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x0000000000900000-0x000000007f8eefff] usable
> [    0.000000] BIOS-e820: [mem 0x000000007f8ef000-0x000000007f9eefff] reserved
> [    0.000000] BIOS-e820: [mem 0x000000007f9ef000-0x000000007faeefff] type 20
> [    0.000000] BIOS-e820: [mem 0x000000007faef000-0x000000007fb6efff] reserved
> [    0.000000] BIOS-e820: [mem 0x000000007fb6f000-0x000000007fb7efff] ACPI data
> [    0.000000] BIOS-e820: [mem 0x000000007fb7f000-0x000000007fbfefff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x000000007fbff000-0x000000007fef3fff] usable
> [    0.000000] BIOS-e820: [mem 0x000000007fef4000-0x000000007ff77fff] reserved
> [    0.000000] BIOS-e820: [mem 0x000000007ff78000-0x000000007fffffff] ACPI NVS
> [    0.000000] BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] efi: EFI v2.70 by EDK II
> [    0.000000] efi:  SMBIOS=0x7f942000  ACPI=0x7fb7e000  ACPI 2.0=0x7fb7e014  MEMATTR=0x7ebe9018
> [    0.000000] secureboot: Secure boot could not be determined (mode 0)
> [    0.000000] SMBIOS 2.8 present.
> [    0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> [    0.000000] Hypervisor detected: KVM
> [    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
> [    0.000000] kvm-clock: cpu 0, msr 28031001, primary cpu clock
> [    0.000000] kvm-clock: using sched offset of 16414477727 cycles
> [    0.000006] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> [    0.000018] tsc: Detected 2400.000 MHz processor
> [    0.000169] last_pfn = 0x7fef4 max_arch_pfn = 0x400000000
> [    0.000220] x86/PAT: PAT not supported by CPU.
> [    0.000227] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
> [    0.015576] RAMDISK: [mem 0x34a09000-0x364fbfff]
> [    0.015601] ACPI: Early table checksum verification disabled
> [    0.015622] ACPI: RSDP 0x000000007FB7E014 000024 (v02 BOCHS )
> [    0.015627] ACPI: XSDT 0x000000007FB7D0E8 000044 (v01 BOCHS  BXPCFACP 00000001      01000013)
> [    0.015639] ACPI: FACP 0x000000007FB7A000 000074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
> [    0.015645] ACPI: DSDT 0x000000007FB7B000 00140B (v01 BOCHS  BXPCDSDT 00000001 BXPC 00000001)
> [    0.015656] ACPI: FACS 0x000000007FBDD000 000040
> [    0.015659] ACPI: APIC 0x000000007FB79000 000078 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
> [    0.015663] ACPI: HPET 0x000000007FB78000 000038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
> [    0.015669] ACPI: BGRT 0x000000007FB77000 000038 (v01 INTEL  EDK2     00000002      01000013)
> [    0.016057] No NUMA configuration found
> [    0.016058] Faking a node at [mem 0x0000000000000000-0x000000007fef3fff]
> [    0.016066] NODE_DATA(0) allocated [mem 0x7fe7f000-0x7fe83fff]
> [    0.016095] Zone ranges:
> [    0.016100]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> [    0.016101]   DMA32    [mem 0x0000000001000000-0x000000007fef3fff]
> [    0.016102]   Normal   empty
> [    0.016103]   Device   empty
> [    0.016104] Movable zone start for each node
> [    0.016105] Early memory node ranges
> [    0.016106]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
> [    0.016107]   node   0: [mem 0x0000000000100000-0x00000000007fffff]
> [    0.016108]   node   0: [mem 0x0000000000808000-0x000000000080ffff]
> [    0.016109]   node   0: [mem 0x0000000000900000-0x000000007f8eefff]
> [    0.016110]   node   0: [mem 0x000000007fbff000-0x000000007fef3fff]
> [    0.016459] Zeroed struct page in unavailable ranges: 1397 pages
> [    0.016461] Initmem setup node 0 [mem 0x0000000000001000-0x000000007fef3fff]
> [    0.019894] ACPI: PM-Timer IO Port: 0xb008
> [    0.019926] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
> [    0.019972] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> [    0.019975] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [    0.019981] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> [    0.019982] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> [    0.019987] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> [    0.019988] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> [    0.019993] Using ACPI (MADT) for SMP configuration information
> [    0.019995] ACPI: HPET id: 0x8086a201 base: 0xfed00000
> [    0.020024] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
> [    0.020051] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
> [    0.020052] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff]
> [    0.020054] PM: Registered nosave memory: [mem 0x00800000-0x00807fff]
> [    0.020055] PM: Registered nosave memory: [mem 0x00810000-0x008fffff]
> [    0.020056] PM: Registered nosave memory: [mem 0x7e7e7000-0x7e7effff]
> [    0.020057] PM: Registered nosave memory: [mem 0x7f8ef000-0x7f9eefff]
> [    0.020058] PM: Registered nosave memory: [mem 0x7f9ef000-0x7faeefff]
> [    0.020059] PM: Registered nosave memory: [mem 0x7faef000-0x7fb6efff]
> [    0.020059] PM: Registered nosave memory: [mem 0x7fb6f000-0x7fb7efff]
> [    0.020060] PM: Registered nosave memory: [mem 0x7fb7f000-0x7fbfefff]
> [    0.020063] [mem 0x80000000-0xffbfffff] available for PCI devices
> [    0.020064] Booting paravirtualized kernel on KVM
> [    0.020069] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> [    0.108622] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1
> [    0.109877] percpu: Embedded 55 pages/cpu s187800 r8192 d29288 u2097152
> [    0.109918] KVM setup async PF for cpu 0
> [    0.109923] kvm-stealtime: cpu 0, msr 7c419500
> [    0.109931] Built 1 zonelists, mobility grouping on.  Total pages: 512892
> [    0.109932] Policy zone: DMA32
> [    0.109934] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.5.0-2-amd64 root=UUID=8ebba08b-ff72-4030-ba43-8ce252e2e5a4 ro console=ttyS0,115200n8
> [    0.111356] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
> [    0.111796] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
> [    0.111837] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.114998] Memory: 248380K/2091564K available (10243K kernel code, 1221K rwdata, 3972K rodata, 1672K init, 1980K bss, 120832K reserved, 0K cma-reserved)
> [    0.115028] random: get_random_u64 called from __kmem_cache_create+0x3e/0x530 with crng_init=0
> [    0.115547] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> [    0.115575] Kernel/User page tables isolation: enabled
> [    0.115617] ftrace: allocating 34294 entries in 134 pages
> [    0.130489] ftrace: allocated 134 pages with 3 groups
> [    0.130821] rcu: Hierarchical RCU implementation.
> [    0.130823] rcu: 	RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=1.
> [    0.130826] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> [    0.130827] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
> [    0.133774] NR_IRQS: 33024, nr_irqs: 256, preallocated irqs: 16
> [    0.134031] Console: colour dummy device 80x25
> [    0.242024] printk: console [ttyS0] enabled
> [    0.242695] ACPI: Core revision 20191018
> [    0.243470] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
> [    0.244954] APIC: Switch to symmetric I/O mode setup
> [    0.246031] x2apic enabled
> [    0.246800] Switched APIC routing to physical x2apic.
> [    0.248946] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.249892] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x22983777dd9, max_idle_ns: 440795300422 ns
> [    0.251455] Calibrating delay loop (skipped) preset value.. 4800.00 BogoMIPS (lpj=9600000)
> [    0.252873] pid_max: default: 32768 minimum: 301
> [    0.256847] BUG: unable to handle page fault for address: 000000007ed03020
> [    0.258328] #PF: supervisor read access in kernel mode
> [    0.259304] #PF: error_code(0x0000) - not-present page
> [    0.259452] PGD fd2d063 P4D fd2d063 PUD fd30063 PMD fd3f063 PTE fffff812fc060
> [    0.259452] Oops: 0000 [#1] SMP PTI
> [    0.259452] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-2-amd64 #1 Debian 5.5.17-1
> [    0.259452] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> [    0.259452] RIP: 0010:0xfffffffeff6c6648
> [    0.259452] Code: 48 83 ec 20 e8 41 fd ff ff 48 85 c9 75 1c 84 c0 74 18 4c 8d 05 15 a1 00 00 48 8d 0d ec 93 00 00 ba ba 00 00 00 e8 c0 fe ff ff <48> 8b 03 48 83 c4 20 5b c3 55 57 56 53 48 89 d3 48 89 ce 48 83 ec
> [    0.259452] RSP: 0000:ffffffffabc03b10 EFLAGS: 00010202
> [    0.259452] RAX: 0000000000000001 RBX: 000000007ed03020 RCX: 000000007ed03020
> [    0.259452] RDX: ffffffffabc03eb0 RSI: 000000007ed03020 RDI: ffffffffab809670
> [    0.259452] RBP: 0000000000000000 R08: 0000000000000007 R09: 0000000000000000
> [    0.259452] R10: 0000000000000000 R11: 800000007ff77063 R12: ffffffffabc03eb0
> [    0.259452] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    0.259452] FS:  0000000000000000(0000) GS:ffff9becbc400000(0000) knlGS:0000000000000000
> [    0.259452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.259452] CR2: 000000007ed03020 CR3: 000000000fd38000 CR4: 00000000000006b0
> [    0.259452] Call Trace:
> [    0.259452]  ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
> [    0.259452]  ? prep_new_page+0x3e/0x150
> [    0.259452]  ? get_page_from_freelist+0xfc1/0x1200
> [    0.259452]  ? native_flush_tlb_global+0x97/0xa0
> [    0.259452]  ? __flush_tlb_all+0x13/0x20
> [    0.259452]  ? efi_call+0x58/0x90
> [    0.259452]  ? virt_efi_set_variable_nonblocking+0xa0/0x120
> [    0.259452]  ? efi_delete_dummy_variable+0x5e/0x80
> [    0.259452]  ? efi_enter_virtual_mode+0x4f7/0x515
> [    0.259452]  ? start_kernel+0x4cd/0x562
> [    0.259452]  ? secondary_startup_64+0xa4/0xb0
> [    0.259452] Modules linked in:
> [    0.259452] CR2: 000000007ed03020
> [    0.259452] ---[ end trace d144de23fcdf159d ]---
> [    0.259452] RIP: 0010:0xfffffffeff6c6648
> [    0.259452] Code: 48 83 ec 20 e8 41 fd ff ff 48 85 c9 75 1c 84 c0 74 18 4c 8d 05 15 a1 00 00 48 8d 0d ec 93 00 00 ba ba 00 00 00 e8 c0 fe ff ff <48> 8b 03 48 83 c4 20 5b c3 55 57 56 53 48 89 d3 48 89 ce 48 83 ec
> [    0.259452] RSP: 0000:ffffffffabc03b10 EFLAGS: 00010202
> [    0.259452] RAX: 0000000000000001 RBX: 000000007ed03020 RCX: 000000007ed03020
> [    0.259452] RDX: ffffffffabc03eb0 RSI: 000000007ed03020 RDI: ffffffffab809670
> [    0.259452] RBP: 0000000000000000 R08: 0000000000000007 R09: 0000000000000000
> [    0.259452] R10: 0000000000000000 R11: 800000007ff77063 R12: ffffffffabc03eb0
> [    0.259452] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    0.259452] FS:  0000000000000000(0000) GS:ffff9becbc400000(0000) knlGS:0000000000000000
> [    0.259452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.259452] CR2: 000000007ed03020 CR3: 000000000fd38000 CR4: 00000000000006b0
> [    0.259452] Kernel panic - not syncing: Attempted to kill the idle task!
> [    0.259452] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

In my case, the final frame in the stack dump was
get_page_from_freelist() (the above backtrace contains two more frames).

I booted the crashing guest with "efi=debug", and compared the crashing
address (CR2) with the UEFI memmap. In my case, the CR2 address pointed
into a "Boot Data" area.

Now, while crashing with a CR2 that points into a "Boot Data" area
*seems* consistent with a broken UEFI runtime DXE driver (either holding
a reference to non-runtime memory, or having failed to convert a pointer
during SetVirtualAddressMap()), I abandoned this idea because (1) I
couldn't determine anything bad or missing -- from code inspection -- in
"MdeModulePkg/Library/VariablePolicyLib/VariablePolicyExtraInitRuntimeDxe.c",
and (2) the backtrace contains get_page_from_freelist(), which I
*believe* indicates it's not EFI runtime code proper that crashes, but
something in the guest kernel memory management.

I also assume Bret had successfully tested the SMM-less build of the
VariablePolicy feature against a number of Windows guests.

If we can locate the kernel fix (with reverse bisection or otherwise),
then I guess it will have to be backported to a bunch of kernels.

... Anyway, just to be safe -- do we want to extend the Hard Feature
Freeze until we track this down with 100% certainty? The release is
currently planned for Nov  27th.

Thanks
Laszlo



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#67974): https://edk2.groups.io/g/devel/message/67974
Mute This Topic: https://groups.io/mt/78455898/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-





More information about the edk2-devel-archive mailing list