[vfio-users] 'dnf update' killed working VM

Alex Williamson alex.williamson at redhat.com
Thu Aug 10 03:45:59 UTC 2017


On Thu, 10 Aug 2017 00:29:36 +0200
Laszlo Ersek <lersek at redhat.com> wrote:

> On 08/09/17 23:37, Alex Williamson wrote:
> > On Wed, 09 Aug 2017 21:55:00 +0100
> > "Patrick O'Callaghan" <poc at usb.ve> wrote:
> >  
> >> On Wed, 2017-08-09 at 13:24 -0500, David wrote:  
> >>> Anyone else having trouble with a recent version of KVM / QEMU?
> >>> Also I am still a Linux newbie, how should I troubleshoot this?  
> >>
> >> For one thing, you could start by looking in the QEMU log file and/or
> >> the system journal. You don't give any information about the VM (e.g.
> >> I assume you're using GPU passthrough but you don't say anything
> >> about it) so it's going to be hard for anyone else to guess what the
> >> problem is. Perhaps if you post the XML file it might give someone a
> >> clue.
> >>
> >> Also, note that Fedora 24 was EOL-ed today. You should update your
> >> system to at least F25 as soon as possible. I'm on F26 and having no
> >> problems.  
> >
> > Yep, really hard to act on the limited information here.  Is it by
> > chance a GPU assigned VM running OVMF and does that OVMF come from the
> > kraxel repo rather than the base fedora repo?  Thanks,  
> 
> Yes, someone who can reproduce the problem -- from the reports, there
> are several users -- will have to bite the bullet, and bisect OVMF,
> and/or bisect the host kernel.

Done.  As with David, I hit the problem that my previously working VM
just hangs with all the vCPUs pegged.  Replacing Gerd's OVMF build with
an older one from the virt-preview repo resolves the issue.  Bisecting
OVMF lands here:

commit 3b2928b46987693caaaeefbb7b799d1e1de803c0
Author: Michael Kinney <michael.d.kinney at intel.com>
Date:   Wed May 17 12:19:16 2017 -0700

    UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues
    
    https://bugzilla.tianocore.org/show_bug.cgi?id=565
    
    Fix NASM compatibility issues with XCODE5 tool chain.
    The XCODE5 tool chain for X64 builds using PIE (Position
    Independent Executable).  For most assembly sources using
    PIE mode does not cause any issues.
    
    However, if assembly code is copied to a different address
    (such as AP startup code in the MpInitLib), then the
    X64 assembly source must be implemented to be compatible
    with PIE mode that uses RIP relative addressing.
    
    The specific changes in this patch are:
    
    * Use LEA instruction instead of MOV instruction to lookup
      the addresses of functions.
    
    * The assembly function RendezvousFunnelProc() is copied
      below 1MB so it can be executed as part of the MpInitLib
      AP startup sequence.  RendezvousFunnelProc() calls the
      external function InitializeFloatingPointUnits().  The
      absolute address of InitializeFloatingPointUnits() is
      added to the MP_CPU_EXCHANGE_INFO structure that is passed
      to RendezvousFunnelProc().
    
    Cc: Andrew Fish <afish at apple.com>
    Cc: Jeff Fan <jeff.fan at intel.com>
    Contributed-under: TianoCore Contribution Agreement 1.0
    Signed-off-by: Michael D Kinney <michael.d.kinney at intel.com>
    Reviewed-by: Jeff Fan <jeff.fan at intel.com>
    Reviewed-by: Andrew Fish <afish at apple.com>

Reverting this patch against current HEAD (7ef0dae092af) also gives me
a working image.  When it fails, it only gets this far:

SecCoreStartupWithStack(0xFFFCC000, 0x818000)
Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE
Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
The 0th FV start address is 0x00000820000, size is 0x000E0000, handle is 0x820000
Register PPI Notify: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
Register PPI Notify: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38
Install PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
Install PPI: DBE23AA9-A345-4B97-85B6-B226F1617389
Loading PEIM at 0x0000082B880 EntryPoint=0x0000082E8F9 PcdPeim.efi
Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480
Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1
Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A
Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81
Loading PEIM at 0x00000830040 EntryPoint=0x00000831415 ReportStatusCodeRouterPei.efi
Install PPI: 0065D394-9951-4144-82A3-0AFC8579C251
Install PPI: 229832D3-7A30-4B36-B827-F40CB7D45436
Loading PEIM at 0x00000831F40 EntryPoint=0x0000083318A StatusCodeHandlerPei.efi
Loading PEIM at 0x00000833DC0 EntryPoint=0x00000837D0E PlatformPei.efi
Select Item: 0x0
FW CFG Signature: 0x554D4551
Select Item: 0x1
FW CFG Revision: 0x3
QemuFwCfg interface (DMA) is supported.
Platform PEIM Loaded
CMOS:
00: 17 00 30 00 21 00 04 09 08 17 26 02 10 80 00 00
10: 00 00 00 00 06 80 02 FF FF 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: FF FF 20 00 00 BF 00 20 30 00 00 00 00 12 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 05
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Select Item: 0x19
Select Item: 0x28
S3 support was detected on QEMU
Install PPI: 7408D748-FC8C-4EE6-9288-C4BEC092A410
Select Item: 0x19
Select Item: 0x24
Select Item: 0x19
Select Item: 0x19
GetFirstNonAddress: Pci64Base=0x800000000 Pci64Size=0x800000000
Select Item: 0x5
MaxCpuCountInitialization: QEMU reports 6 processor(s)
PublishPeiMemory: mPhysMemAddressWidth=36 PeiMemoryCap=65800 KB
PeiInstallPeiMemory MemoryBegin 0xBBF0E000, MemoryLength 0x4042000
QemuInitializeRam called
Select Item: 0x19
Select Item: 0x24
Reserved variable store memory: 0xBFECC000; size: 528kb
Platform PEI Firmware Volume Initialization
Install PPI: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
Notify: PPI Guid: 49EDB1C1-BF21-4761-BB12-EB0031AABB39, Peim notify entry point: 826922
The 1th FV start address is 0x00000900000, size is 0x00A00000, handle is 0x900000
Select Item: 0x19
Select Item: 0x19
Select Item: 0x19
Select Item: 0x25
Register PPI Notify: EE16160A-E8BE-47A6-820A-C6900DB0250A
Temp Stack : BaseAddress=0x814000 Length=0x4000
Temp Heap  : BaseAddress=0x810000 Length=0x4000
Total temporary memory:    32768 bytes.
  temporary memory stack ever used: 16384 bytes.
  temporary memory heap used:       8000 bytes.
Old Stack size 16384, New stack size 131072
Stack Hob: BaseAddress=0xBBF0E000 Length=0x20000
Heap Offset = 0xBB71E000 Stack Offset = 0xBB716000
TemporaryRamMigration(0x810000, 0xBBF2A000, 0x8000)
Loading PEIM at 0x000BFEBF000 EntryPoint=0x000BFEC7C48 PeiCore.efi
Reinstall PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Reinstall PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
Reinstall PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
Install PPI: F894643D-C449-42D1-8EA8-85BDD8C65BDE
Loading PEIM at 0x000BFEBB000 EntryPoint=0x000BFEBD941 DxeIpl.efi
Install PPI: 1A36E4E7-FAB6-476A-8E75-695A0576FDD7
Install PPI: 0AE8CE5D-E448-4437-A8D7-EBF5F194F731
Loading PEIM at 0x000BFEB7000 EntryPoint=0x000BFEB9304 S3Resume2Pei.efi
Install PPI: 6D582DBC-DB85-4514-8FCC-5ADF6227B147
Loading PEIM at 0x000BFEAF000 EntryPoint=0x000BFEB3189 CpuMpPei.efi
AP Loop Mode is 1
WakeupBufferStart = 9F000, WakeupBufferSize = 1000
<hang>

Without the above patch, we continue on as:

APIC MODE is 1
MpInitLib: Find 6 processors in system.
Does not find any stored CPU BIST information from PPI!
  APICID - 0x00000000, BIST - 0x00000000
  APICID - 0x00000001, BIST - 0x00000000
  APICID - 0x00000002, BIST - 0x00000000
  APICID - 0x00000003, BIST - 0x00000000
  APICID - 0x00000004, BIST - 0x00000000
  APICID - 0x00000005, BIST - 0x00000000
Install PPI: 9E9F374B-8F16-4230-9824-5846EE766A97
Install PPI: EE16160A-E8BE-47A6-820A-C6900DB0250A
Notify: PPI Guid: EE16160A-E8BE-47A6-820A-C6900DB0250A, Peim notify entry point: 835C29
DXE IPL Entry
Loading PEIM at 0x000BFE5B000 EntryPoint=0x000BFE605E2 DxeCore.efi
Loading DXE CORE at 0x000BFE5B000 EntryPoint=0x000BFE605E2
Install PPI: 605EA650-C65C-42E1-BA80-91A52AB618C6
CoreInitializeMemoryServices:
  BaseAddress - 0xBBF32000 Length - 0x3EC7000 MinimalMemorySizeNeeded - 0x10F4000
...


Given the patch identified by bisect, I'll also note that my build
environment is recent F26 system, I don't see any toolchain stuff
available for update.

$ nasm --version
NASM version 2.13.01 compiled on May 22 2017

Thanks,
Alex




More information about the vfio-users mailing list