[vfio-users] [OVMF+VIFO+QEMU] Large VM RAM allocation = Long Boot Times

Fri Nov 16 21:36:05 UTC 2018

And just a quick discovery:

With 24G assigned (-m 24576)...

Reducing the number of CPU cores to 1, 2 or 3 with
ovmf-x64-0-20181105.757.g114a75ad86 results in a really fast boot to
the TianoCore splash.
As soon as the number of CPU cores jumps to 4, the VM takes a hit in
terms of boot time.

1 core, 1 thread: ~16 seconds.
2 cores, 1 thread: ~16 seconds
3 cores, 1 thread: ~19 seconds
4 cores, 1 thread: ~111 seconds
8 cores, 1 thread: ~150 seconds
8 cores, 2 threads: ~334 seconds

(timing is based on grepping debugcon.log for "[Bds]OsIndication:",
which seems to occur right after the splash screen is displayed)

Significant pauses can be observed at points in the boot process,
based on the debugcon output...

First pause is here:

> Loading PEIM at 0x0007FEC1000 EntryPoint=0x0007FEC6985 CpuMpPei.efi
> Register PPI Notify: F894643D-C449-42D1-8EA8-85BDD8C65BDE
> Notify: PPI Guid: F894643D-C449-42D1-8EA8-85BDD8C65BDE, Peim notify entry point: 7FEC58C4
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000

Then here:

> Loading driver at 0x0007F96C000 EntryPoint=0x0007F9743EE CpuDxe.efi
> InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 7F4AC618
> ProtectUefiImageCommon - 0x7F4AE040
>   - 0x000000007F96C000 - 0x000000000000E3C0
> Paging: added 512 pages to page table pool
> CurrentPagingContext:
>   MachineType   - 0x8664
>   PageTableBase - 0x7FA01000
>   Attributes    - 0xC0000006
> InstallProtocolInterface: 26BACCB1-6F42-11D4-BCE7-0080C73C8881 7F979E20
> MemoryProtectionCpuArchProtocolNotify:
<snip>
> ConvertPages: failed to find range A0000 - FFFFF
> ConvertPages: failed to find range 80000000 - 8FFFFFFF
> ConvertPages: failed to find range 90000000 - FBFFFFFF
> ConvertPages: failed to find range FEC00000 - FEC00FFF
> Failed to update capability: [8] 00000000FED00000 - 00000000FED003FF (C700000000000001 -> C700000000026001)
> ConvertPages: failed to find range FED1C000 - FED1FFFF
> ConvertPages: failed to find range FEE00000 - FEEFFFFF
> ConvertPages: failed to find range FFE00000 - FFFFFFFF
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 229

And the last one:

> Loading driver at 0x0007F918000 EntryPoint=0x0007F91D876 PciHostBridgeDxe.efi
> InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 7F01D798
> ProtectUefiImageCommon - 0x7F01D1C0
>   - 0x000000007F918000 - 0x0000000000009F00
> Select Item: 0x0
> FW CFG Signature: 0x554D4551
> Select Item: 0x1
> FW CFG Revision: 0x3
> QemuFwCfg interface (DMA) is supported.
> Select Item: 0x19
> InitRootBridge: populated root bus 0, with room for 255 subordinate bus(es)
> RootBridge: PciRoot(0x0)
>   Support/Attr: 70069 / 70069
>     DmaAbove4G: No
> NoExtConfSpace: No
>      AllocAttr: 3 (CombineMemPMem Mem64Decode)
>            Bus: 0 - FF Translation=0
>             Io: 6000 - FFFF Translation=0
>            Mem: 90000000 - FBFFFFFF Translation=0
>     MemAbove4G: 800000000 - FFFFFFFFF Translation=0
>           PMem: FFFFFFFFFFFFFFFF - 0 Translation=0
>    PMemAbove4G: FFFFFFFFFFFFFFFF - 0 Translation=0

Regards,
Adam

On Thu, Nov 15, 2018 at 10:28 PM A Z <adam at zegelin.com> wrote:
>
> On Thu, Nov 15, 2018 at 2:47 AM Laszlo Ersek <lersek at redhat.com> wrote:
> >
> > If nothing at all had changed in your configuration, then the behavior
> > wouldn't have changed either. You must certainly have upgraded some
> > components since then, between host kernel, QEMU, and OVMF (likely all
> > of them). Please state the version numbers that work and don't work, and
> > try to isolate the issue to a version change that affects a *single*
> > component.
>
> I wasn't clear on this, sorry. I meant that the VM configuration --
> the qemu command-line parameters, *except* for the OVMF firmware image
> and the amount of memory assigned, haven't changed since I got
> everything up-and-running (i.e., CPU caps, disk, network and all the
> PCI-passthrough assignments, etc haven't changed.)
> I added more memory to the host a few months ago, and upped the VM
> allocation to match at the same time. This made the boot times even
> worse.
> I've also occasionally downloaded a newer version of OVMF to see if it
> fixes this boot time issue.
> The host OS kernel, qemu and other package versions have changed over
> this period as I've installed OS updates. So has the build of Windows
> 10.
>
> > I vaguely recall similar reports from the past. I've been under the
> > impression that this kind of issue has been fixed. Thus, I'd guess you
> > are running old(er) versions of the affected component (whichever it
> > is), or else you are running bleeding edge, and the bug has crept back
> > in (new regression).
>
> It's possibly the latter.
>
> Host kernel: 4.18.10-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 26 09:48:22
> UTC 2018 x86_64 GNU/Linux
> qemu version: QEMU emulator version 3.0.0
> OMVF build: ovmf-x64-0-20181105.757.g114a75ad86 (downloaded from www.kraxel.org)
>
>
> On Thu, Nov 15, 2018 at 6:14 AM Hristo Iliev <hristo at hiliev.eu> wrote:
>
> > This sounds quite like a problem I used to have in the past with certain versions of OVMF and the Linux kernel. If I remember correctly, some memory ranges would get marked as non-cacheable, which resulted in the terrible slowdown you describe. The resolution back then was to stick to older versions of both the firmware and the kernel.
> >
> > I still keep around an older version of OVMF that I used until recently - edk2.git-ovmf-x64-0-20160714.b1992.gfc3005.noarch.rpm. You could download the RPM here and try if it works for you:
> >
> > https://exa.flopser.de/s/z3cjqFUlZ2gBPCy
>
> I will try this firmware build. I have read similar things about
> versions from 2016 working correctly.
> But TBH I'd also prefer to run something newer -- bug fixes (and
> regressions!), security patches, etc.
>
> > Recent QEMU versions started complaining about firmware incompatibilities, so I tried the latest RPM from Kraxel (https://www.kraxel.org/repos/jenkins/edk2/) and it works just fine. The host system is Arch Linux with the latest Arch kernel and QEMU.
>
> While I'm not running today's build, I am running one downloaded from
> Kraxel.org within the past week.
>
>
> On Thu, Nov 15, 2018 at 7:28 AM Alex Williamson
> <alex.williamson at redhat.com> wrote:
>
> > If you can confirm that your system uses a NUMA memory configuration
> > and that 24G exceeds the node local memory, then this is a known
> > regression in the Linux kernel mm.  AIUI it's being worked upstream,
> > workarounds are to either use hugepages or locked memory for the VM.
>
> I'm currently using hugepages. Not sure how to lock the memory, sorry.
>
> It is a dual CPU system. The topology:
>
> $ sudo numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
> node 0 size: 32143 MB
> node 0 free: 236 MB
> node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42
> 43 44 45 46 47
> node 1 size: 32232 MB
> node 1 free: 95 MB
> node distances:
> node   0   1
>   0:  10  21
>   1:  21  10
>
>
>
> For reference, the qemu command line I'm exec'ing from a script is:
>
> /usr/sbin/qemu-system-x86_64 \
>         -name win10,debug-threads=on -monitor
> unix:win10.sock,server,nowait -nodefconfig -debugcon file:debugcon.log
> -global isa-debugcon.iobase=0x402 \
>         -nodefconfig -no-user-config -nodefaults -nographic \
>         -machine q35,accel=kvm \
>         -enable-kvm -cpu
> host,kvm=off,hv_spinlocks=0x1fff,hv_relaxed,hv_time,hv_vendor_id=Nvidia43FIX
> \
>         -rtc base=localtime,clock=host,driftfix=slew \
>         \
>         -drive file=/vmstorage/firmware/ovmf-x64-0-20181105.757.g114a75ad86/OVMF-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
> \
>         -drive file=/home/adam/win10-vars.fd,if=pflash,format=raw,readonly=off,unit=1
> \
>         \
>         -object iothread,id=io1 \
>         \
>         -mem-path /dev/hugepages  \
>         -m 24576 \
>         -smp cores=8,threads=2,sockets=1 \
>         \
>         -netdev bridge,id=netdev0,br=br0 -device
> virtio-net-pci,netdev=netdev0,mac=52:54:00:19:34:57 \
>         -device
> virtio-scsi-pci,id=scsi0,ioeventfd=on,iothread=io1,num_queues=4 \
>         \
>         -device vfio-pci,host=7:00.0,addr=0x6,multifunction=on \
>         -device vfio-pci,host=7:00.1,addr=0x6.0x1 \
>         -device vfio-pci,host=05:00.0 \
>         -device vfio-pci,host=06:00.0
>
>
> Regards,
> Adam