[vfio-users] Fwd: failing IGD passthrough on apollo lake - BAR 2 error

Geert Coulommier g.coulommier at gmail.com
Fri Jul 28 17:30:54 UTC 2017


Hi,

Following this guide
<https://medium.com/@calerogers/gpu-virtualization-with-kvm-qemu-63ca98a6a172>
before, I had already tried the vbios-romdump route, but now with a lot of
other stuff checked of, I gave it another try:

1. booted a live linux from a usb-stick on the host with cms enabled,
making sure not to use uefi while booting
2. dumped the vga-bios
3. ran the dump through rom-parser
4. created rom-dump
At this point I got a bit confused: as i understand correctly, rom-parser
is only for viewing, rom-fixer for editing. As I noticed that the device id
didn't correspond with the one reported by lspci -nnk, I made a few version
of the dump with changed device id's, and having rom-fixer correct the
checksum. If it doesn't help, it can't hurt either.

this is what lspci -nnk reported:
VGA compatible controller [0300]: Intel Corporation Device *[8086:5a85]*
(rev 0b)
Subsystem: ASRock Incorporation Device *[1849:5a85]*

So I made and tested following dumps in the vm, following the vendor and
device id's from the lspci -nnk:
a) original dump (not changed): vendor id *8086* (Intel), device id *0406*
(unknown)
b) dump with changed device id: vendor id *8086* (Intel) and device id
*5a85* (Intel HD500)
c) dump with changed vendor id and device id: vendor id *1849* (Asrock),
device id *5a85* (Intel HD500)

This proved to be useless, as there was no difference in output when either
one of the versions was used.

5. added the path to the vm.xml
<rom bar='on' file='/var/lib/libvirt/vbios_dump/vbios_intel_HD500.rom'/>

The rom bar='on' was added by virt-manager (as also documented here
<https://doc.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html/Virtualization_Deployment_and_Administration_Guide/sub-sub-section-libvirt-dom-xml-devices-interface-ROM-BIOS-configuration.html>),
and provided some interesting results:

With rom bar='off', the results were identical to the situation before
(where rom bar was on, but no vga-bios-rom-file specified): black screen
(that powers off), 1 of the  4 assigned virtual cpu's maxing out, as well
as the virtual memory. Also the messages in dmesg were identical to before.

With rom bar='on', this time the vm refused to start, and I got below error
messages in the virsh console:

error: Failed to start domain ubuntu16.04_desktop
error: internal error: process exited while connecting to monitor: warning:
host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21]
warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22]
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29]
warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31]
warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit
2]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit
3]
warning: host doesn't support requested feature: CPUID.01H:ECX.ds-cpl [bit
4]
warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7]
warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8]
warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14]
warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn't support requested feature: CPUID.01H:EC

Notice the final line not being complete. Often, this last line would go on
to:

warning: host doesn't support requested feature: CPUID.01H:ECX.osxs

Even stranger is the fact that even if I enter a path to a non-existent
file, the result and error message will be the same. Doublechecked rights
and path.

Running the .xml with only <rom bar='on'> (and no rom-file specified) is
what I was running before with the usual result (black screen, no errors).
Same when I remove all pci-passthrough-, video- and graphics devices.

If there is any other info I can provide or something I can try, I would
gladly do so. Thanks for any suggestions.

Kind regards,

Geert

On 27 July 2017 at 17:41, Alex Williamson <alex.williamson at redhat.com>
wrote:

> On Thu, 27 Jul 2017 11:32:24 +0200
> Geert Coulommier <g.coulommier at gmail.com> wrote:
>
> > Hi,
> >
> > so I've tried the 2 options you suggested:
> >
> > 1) "look in /proc/iomem and identify the driver that's still claiming
> > portions of IGD and disable it"
> >
> > from /proc/iomem:
> >
> > ...
> > 80000000-cfffffff : PCI Bus 0000:00
> >   80000000-8fffffff : 0000:00:02.0
> >     80000000-808cffff : efifb
> > ...
> >
> > which is strange as to prevent this, the part "video=efifb:off" was added
> > to grub:
> >
> > GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on iommu=pt
> > rd.driver.pre=vfio-pci video=vesafb:off,efifb:off"
> >
> > Because I'm running the host on uefi, and to keeps things clean, I
> removed
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is probably the next piece of the puzzle...
>
> > the "vesafb:off"-part:
> > GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on iommu=pt
> > rd.driver.pre=vfio-pci video=efifb:off"
> >
> > Unexpectedly, this seemed have an effect. Now from /proc/iomem (when not
> > running the VM,full printout of /proc/iomem below in [1]):
> >
> > ...
> > 80000000-cfffffff : PCI Bus 0000:00
> >   80000000-8fffffff : 0000:00:02.0
> >   90000000-90ffffff : 0000:00:02.0
> >   91000000-910fffff : 0000:00:0e.0
> >   91100000-911fffff : PCI Bus 0000:03
> >     91100000-911001ff : 0000:03:00.0
> >       91100000-911001ff : ahci
> > ...
> >
> > when running the VM, it goes to:
> >
> > ...
> > 80000000-cfffffff : PCI Bus 0000:00
> >   80000000-8fffffff : 0000:00:02.0
> >     80000000-8fffffff : vfio-pci
> >   90000000-90ffffff : 0000:00:02.0
> >     90000000-90ffffff : vfio-pci
> >   91000000-910fffff : 0000:00:0e.0
> >     91000000-910fffff : vfio-pci
> >   91100000-911fffff : PCI Bus 0000:03
> >     91100000-911001ff : 0000:03:00.0
> >       91100000-911001ff : ahci
> >   91200000-912fffff : PCI Bus 0000:01
> >     91200000-91203fff : 0000:01:00.0
> >       91200000-91203fff : r8169
> >     91204000-91204fff : 0000:01:00.0
> >       91204000-91204fff : r8169
> >   91300000-9130ffff : 0000:00:15.0
> >     91300000-9130ffff : xhci-hcd
> >   91310000-91313fff : 0000:00:0e.0
> >     91310000-91313fff : vfio-pci
> >   91314000-91315fff : 0000:00:12.0
> >     91314000-91315fff : ahci
> >   91316000-913160ff : 0000:00:1f.1
> >   91317000-913177ff : 0000:00:12.0
> >     91317000-913177ff : ahci
> >   91318000-913180ff : 0000:00:12.0
> >     91318000-913180ff : ahci
> >   9131b000-9131bfff : 0000:00:0f.0
> >     9131b000-9131bfff : mei_me
> > ...
> >
> >
> > and the dmesg log:
> >
> > dmesg | grep -aiE '((DMAR)|(kvm)|(drm)|(Command line)|(iommu)|(vfio))'
> > [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.12.
> 3-041203-generic
> > root=/dev/mapper/granada--vg-root ro quiet splash intel_iommu=on
> iommu=pt
> > rd.driver.pre=vfio-pci video=efifb:off vt.handoff=7
> > [    0.000000] ACPI: DMAR 0x000000006D9D0470 0000A8 (v01 INTEL  EDK2
> > 00000003 BRXT 0100000D)
> > [    0.000000] Kernel command line:
> > BOOT_IMAGE=/boot/vmlinuz-4.12.3-041203-generic
> > root=/dev/mapper/granada--vg-root ro quiet splash intel_iommu=on
> iommu=pt
> > rd.driver.pre=vfio-pci video=efifb:off vt.handoff=7
> > [    0.000000] DMAR: IOMMU enabled
> > [    0.044107] DMAR: Host address width 39
> > [    0.044109] DMAR: DRHD base: 0x000000fed64000 flags: 0x0
> > [    0.044126] DMAR: dmar0: reg_base_addr fed64000 ver 1:0 cap
> > 1c0000c40660462 ecap 7e3ff0505e
> > [    0.044128] DMAR: DRHD base: 0x000000fed65000 flags: 0x1
> > [    0.044139] DMAR: dmar1: reg_base_addr fed65000 ver 1:0 cap
> > d2008c40660462 ecap f050da
> > [    0.044142] DMAR: RMRR base: 0x0000006d5af000 end: 0x0000006d5cefff
> > [    0.044145] DMAR: RMRR base: 0x0000006f800000 end: 0x0000007fffffff
> > [    0.044148] DMAR-IR: IOAPIC id 1 under DRHD base  0xfed65000 IOMMU 1
> > [    0.044150] DMAR-IR: HPET id 0 under DRHD base 0xfed65000
> > [    0.044152] DMAR-IR: Queued invalidation will be enabled to support
> > x2apic and Intr-remapping.
> > [    0.046253] DMAR-IR: Enabled IRQ remapping in x2apic mode
> > [    1.794596] DMAR: No ATSR found
> > [    1.795685] DMAR: dmar0: Using Queued invalidation
> > [    1.795694] DMAR: dmar1: Using Queued invalidation
> > [    1.795872] DMAR: Hardware identity mapping for device 0000:00:00.0
> > [    1.795882] DMAR: Hardware identity mapping for device 0000:00:02.0
> > [    1.795886] DMAR: Hardware identity mapping for device 0000:00:0e.0
> > [    1.795888] DMAR: Hardware identity mapping for device 0000:00:0f.0
> > [    1.795890] DMAR: Hardware identity mapping for device 0000:00:12.0
> > [    1.795892] DMAR: Hardware identity mapping for device 0000:00:13.0
> > [    1.795895] DMAR: Hardware identity mapping for device 0000:00:13.1
> > [    1.795897] DMAR: Hardware identity mapping for device 0000:00:13.2
> > [    1.795899] DMAR: Hardware identity mapping for device 0000:00:13.3
> > [    1.795902] DMAR: Hardware identity mapping for device 0000:00:15.0
> > [    1.795904] DMAR: Hardware identity mapping for device 0000:00:1f.0
> > [    1.795906] DMAR: Hardware identity mapping for device 0000:00:1f.1
> > [    1.795911] DMAR: Hardware identity mapping for device 0000:01:00.0
> > [    1.795916] DMAR: Hardware identity mapping for device 0000:03:00.0
> > [    1.795917] DMAR: Setting RMRR:
> > [    1.795920] DMAR: Ignoring identity map for HW passthrough device
> > 0000:00:02.0 [0x6f800000 - 0x7fffffff]
> > [    1.795922] DMAR: Ignoring identity map for HW passthrough device
> > 0000:00:15.0 [0x6d5af000 - 0x6d5cefff]
> > [    1.795924] DMAR: Prepare 0-16MiB unity mapping for LPC
> > [    1.795926] DMAR: Ignoring identity map for HW passthrough device
> > 0000:00:1f.0 [0x0 - 0xffffff]
> > [    1.795954] DMAR: Intel(R) Virtualization Technology for Directed I/O
> > [    1.796125] iommu: Adding device 0000:00:00.0 to group 0
> > [    1.796140] iommu: Adding device 0000:00:02.0 to group 1
> > [    1.796157] iommu: Adding device 0000:00:0e.0 to group 2
> > [    1.796174] iommu: Adding device 0000:00:0f.0 to group 3
> > [    1.796187] iommu: Adding device 0000:00:12.0 to group 4
> > [    1.796229] iommu: Adding device 0000:00:13.0 to group 5
> > [    1.796254] iommu: Adding device 0000:00:13.1 to group 5
> > [    1.796271] iommu: Adding device 0000:00:13.2 to group 5
> > [    1.796288] iommu: Adding device 0000:00:13.3 to group 5
> > [    1.796317] iommu: Adding device 0000:00:15.0 to group 6
> > [    1.796338] iommu: Adding device 0000:00:1f.0 to group 7
> > [    1.796350] iommu: Adding device 0000:00:1f.1 to group 7
> > [    1.796361] iommu: Adding device 0000:01:00.0 to group 5
> > [    1.796371] iommu: Adding device 0000:03:00.0 to group 5
> > [    2.512432] ata1.00: supports DRM functions and may not be fully
> > accessible
> > [    2.514160] ata1.00: supports DRM functions and may not be fully
> > accessible
> > [    3.124755] VFIO - User Level meta-driver version: 0.3
> > [    3.137417] vfio-pci 0000:00:02.0: vgaarb: changed VGA decodes:
> > olddecodes=io+mem,decodes=io+mem:owns=io+mem
> > [    3.156196] vfio_pci: add [8086:5a85[ffff:ffff]] class
> 0x000000/00000000
> > [    3.176202] vfio_pci: add [8086:5a98[ffff:ffff]] class
> 0x000000/00000000
> >
> > with these entries added when running the VM:
> >
> > [   49.439866] vfio_cap_init: 0000:00:0e.0 pci config conflict @0x80, was
> > cap 0x9 now cap 0x10
> > [   49.439869] vfio_cap_init: 0000:00:0e.0 pci config conflict @0x81, was
> > cap 0x9 now cap 0x10
> > [   49.439871] vfio_cap_init: 0000:00:0e.0 pci config conflict @0x82, was
> > cap 0x9 now cap 0x10
> > [   49.439873] vfio_cap_init: 0000:00:0e.0 pci config conflict @0x83, was
> > cap 0x9 now cap 0x10
> > [   49.442695] DMAR: DRHD: handling fault status reg 3
> > [   49.442710] DMAR: [DMA Write] Request device [00:02.0] fault addr 0
> > [fault reason 02] Present bit in context entry is clear
> > [   49.567831] vfio_ecap_init: 0000:00:02.0 hiding ecap 0x1b at 0x100
> >
> > Passthrough still doesn't work though, and the last two lines in the
> kernel
> > log seem to hint at that. So from option one to option 2:
>
> Nope, those are normal.
>
> > 2) "don't blacklist i915, let the kernel boot with it, then do a 'virsh
> > nodedev-detach pci_0000_00_02_0' at boot before starting the VM so that
> > you're not binding it back to i915 after every instance of running the
> > VM."
> >
> > So I unblacklisted i915 an executed 'virsh nodedev-dettach
> > pci_0000_00_02_0':
> >
> > virsh nodedev-dettach pci_0000_00_02_0
> > Device pci_0000_00_02_0 detached
> >
> > Then ran the VM. Unfortunately, results are the same, as are the log
> > entries in the kernel log (see above).
> >
> > When running the same virsh 'nodedev-dettach pci_0000_00_02_0' command
> when
> > running the VM, I get:
> >
> > virsh nodedev-dettach pci_0000_00_02_0
> > error: Failed to detach device pci_0000_00_02_0
> > error: Requested operation is not valid: PCI device 0000:00:02.0 is in
> use
> > by driver QEMU, domain ubuntu16.04_desktop
> >
> > So it does seem to be attached to the VM correctly.
> >
> > Maybe interesting observation: when the host boots, the screen shows grub
> > and then goes black but stays powered on. When I launch the VM, the
> screen
> > stays black but powers off.
>
> I think either mechanism above is equally effective and the remaining
> pieces is likely that there's no VGA BIOS to initialize the graphics
> because the host is running in UEFI mode.  To solve this, burn some
> sort of Linux live CD/DVD image and temporarily set the host BIOS to
> boot into CSM/legacy mode to that live image.  Sometimes you'll be able
> to select between a UEFI or legacy mode booting the CD or you might be
> able to prioritize legacy over UEFI, it depends on the system.  Once
> you've booted the image, dump the ROM for the IGD to a file and copy it
> somewhere that you'll be able to get to it later.  Undo any settings
> for booting the image and then look at my rom-fixer utility:
>
> https://github.com/awilliam/rom-parser
>
> Run that on the ROM and then add a <rom file='/path/to/vga.rom'/> to
> the IGD hostdev entry in the xml.
>
> > Finally, until now I had ignored the errors in the kernel log on the
> audio
> > device (0000:00:0e.0) as I was focusing on the gpu. As requested, in [2]
> > the output of 'sudo lspci -xxxxs 0000:00:0e.0'.
> >
> Thanks, I'll take a look at this.
>
> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20170728/6c36a51c/attachment.htm>


More information about the vfio-users mailing list