[vfio-users] VFIO-PCI with AARCH64 QEMU

Laszlo Ersek lersek at redhat.com
Tue Oct 25 09:35:13 UTC 2016


CC Ard and Eric (Alex will see this anyway)

On 10/25/16 02:29, Haynal, Steve wrote:
> Hi,
> 
>  
> 
> I am using VFIO-PCI to pass through a PCIe endpoint on an FPGA card to
> virtual x86 and aarch64 QEMU instances. It works fine on x86 but I have
> problems with aarch64.  On aarch64, memory for both BARs on the device
> shows up as disabled, and one BAR is ignored. From this presentation
> <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwi-vvyD1_TPAhVIy2MKHZmCCt4QFggcMAA&url=https%3A%2F%2Fwww.linux-kvm.org%2Fimages%2Fb%2Fb4%2F2012-forum-VFIO.pdf&usg=AFQjCNFpT9XRMMrhxR0U7_y1tIqXNp13ww&sig2=-4egMzF5EO-DRy5BJizgvg&bvm=bv.136593572,d.cGc>,
> I’m under the impression that I can remap PCI devices without KVM to a
> virtual aarch64 machine on an x86 host. Is this possible? Below are
> details for my problem:
> 
>  
> 
> Host is running Ubuntu Xenial, 4.4 Kernel.
> 
> Both virtual machines are also Ubuntu Xenial, 4.4 Kernel, from Ubuntu
> Cloud Images. Except for architecture, they are pretty much identical.
> 
> Qemu is 2.7.50 from git.
> 
>  
> 
> Command line to start qemu for x86 machine:
> 
> qemu-system-x86_64 -enable-kvm -net nic -net user -hda disk.img -hdb
> my-seed.img -m 1024 -smp 2 -device
> vfio-pci,host=01:00.0,addr=09.0,multifunction=on -redir tcp:2223::22
> 
>  
> 
> Lspci –vvv for remapped device in x86 machine. Note both regions are
> enabled.
> 
> 00:09.0 Memory controller: Xilinx Corporation Device 7022
> 
>                Subsystem: Xilinx Corporation Device 0007
> 
>                Physical Slot: 9
> 
>                Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV-
> VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
> 
>                Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 
>                Interrupt: pin A routed to IRQ 10
> 
>                Region 0: Memory at e0071000 (32-bit, non-prefetchable)
> [size=4K]
> 
>                Region 1: Memory at c0000000 (32-bit, non-prefetchable)
> [size=512M]
> 
>                Capabilities: [80] Power Management version 3
> 
>                               Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 
>                               Status: D0 NoSoftRst+ PME-Enable- DSel=0
> DScale=0 PME-
> 
>                Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+
> 
>                               Address: 0000000000000000  Data: 0000
> 
>                Capabilities: [c0] Express (v2) Endpoint, MSI 00
> 
>                               DevCap:               MaxPayload 512
> bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> 
>                                              ExtTag- AttnBtn- AttnInd-
> PwrInd- RBE+ FLReset-
> 
>                               DevCtl:  Report errors: Correctable-
> Non-Fatal+ Fatal+ Unsupported+
> 
>                                              RlxdOrd+ ExtTag- PhantFunc-
> AuxPwr- NoSnoop+
> 
>                                              MaxPayload 256 bytes,
> MaxReadReq 512 bytes
> 
>                               DevSta: CorrErr- UncorrErr- FatalErr-
> UnsuppReq- AuxPwr- TransPend-
> 
>                               LnkCap: Port #0, Speed 2.5GT/s, Width x8,
> ASPM not supported, Exit Latency L0s unlimited, L1 unlimited
> 
>                                              ClockPM- Surprise-
> LLActRep- BwNot- ASPMOptComp+
> 
>                               LnkCtl:   ASPM Disabled; RCB 64 bytes
> Disabled- CommClk+
> 
>                                              ExtSynch- ClockPM-
> AutWidDis- BWInt- AutBWInt-
> 
>                               LnkSta:  Speed 2.5GT/s, Width x8, TrErr-
> Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 
>                               DevCap2: Completion Timeout: Range B,
> TimeoutDis+, LTR-, OBFF Not Supported
> 
>                               DevCtl2: Completion Timeout: 65ms to
> 210ms, TimeoutDis-, LTR-, OBFF Disabled
> 
>                               LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete-, EqualizationPhase1-
> 
>                                              EqualizationPhase2-,
> EqualizationPhase3-, LinkEqualizationRequest-
> 
>  
> 
>  
> 
> Command line to start qemu for aarch64 machine:
> 
> qemu-system-aarch64 -smp 2 -m 2048 -M virt -bios QEMU_EFI.fd -device
> virtio-blk-device,drive=image -drive if=none,id=image,file=disk.img
> -device virtio-blk-device,drive=cloud -drive
> if=none,id=cloud,file=cloud.img -netdev user,id=user0 -device
> virtio-net-device,netdev=user0 -redir tcp:2222::22 -cpu cortex-a57
> -device vfio-pci,host=01:00.0,addr=09.0,multifunction=on
> 
>  
> 
> Lspci –vvv for remapped device in aarch64 machine. Note both regions are
> disabled, one is ignored.
> 
> 00:09.0 Memory controller: Xilinx Corporation Device 7022
> 
>                Subsystem: Xilinx Corporation Device 0007
> 
>                Control: I/O- Mem- BusMaster- SpecCycle- MemWINV-
> VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> 
>                Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 
>                Interrupt: pin A routed to IRQ 47
> 
>                Region 0: Memory at 10000000 (32-bit, non-prefetchable)
> [disabled] [size=4K]
> 
>                Region 1: Memory at <ignored> (32-bit, non-prefetchable)
> [disabled]
> 
>                Capabilities: [80] Power Management version 3
> 
>                               Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> 
>                               Status: D0 NoSoftRst+ PME-Enable- DSel=0
> DScale=0 PME-
> 
>                Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+
> 
>                               Address: 0000000000000000  Data: 0000
> 
>                Capabilities: [c0] Express (v2) Root Complex Integrated
> Endpoint, MSI 00
> 
>                               DevCap:               MaxPayload 512
> bytes, PhantFunc 0
> 
>                                              ExtTag- RBE+
> 
>                               DevCtl:  Report errors: Correctable-
> Non-Fatal+ Fatal+ Unsupported+
> 
>                                              RlxdOrd+ ExtTag- PhantFunc-
> AuxPwr- NoSnoop+
> 
>                                              MaxPayload 256 bytes,
> MaxReadReq 512 bytes
> 
>                               DevSta: CorrErr- UncorrErr- FatalErr-
> UnsuppReq- AuxPwr- TransPend-
> 
>                               DevCap2: Completion Timeout: Range B,
> TimeoutDis+, LTR-, OBFF Not Supported
> 
>                               DevCtl2: Completion Timeout: 65ms to
> 210ms, TimeoutDis-, LTR-, OBFF Disabled
> 
>                Capabilities: [100 v2] Advanced Error Reporting
> 
>                               UESta:   DLP- SDES- TLP- FCP- CmpltTO-
> CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 
>                               UEMsk: DLP- SDES- TLP- FCP- CmpltTO-
> CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 
>                               UESvrt:  DLP+ SDES+ TLP+ FCP+ CmpltTO+
> CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 
>                               CESta:   RxErr- BadTLP- BadDLLP- Rollover-
> Timeout- NonFatalErr-
> 
>                               CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+
> Timeout+ NonFatalErr+
> 
>                               AERCap:               First Error Pointer:
> 00, GenCap- CGenEn- ChkCap- ChkEn-
> 
>  
> 
>  
> 
> On the aarch64 machine, when I rescan the PCI bus, I see the following
> in dmesg:
> 
> [  365.482929] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
> 
> [  365.521023] pci 0000:00:09.0: [10ee:7022] type 00 class 0x058000
> 
> [  365.522971] pci 0000:00:09.0: reg 0x10: [mem 0x10000000-0x10000fff]
> 
> [  365.523213] pci 0000:00:09.0: reg 0x14: [mem 0x80000000-0x9fffffff]
> 
> [  365.539233] pci 0000:00:09.0: BAR 1: no space for [mem size 0x20000000]
> 
> [  365.539406] pci 0000:00:09.0: BAR 1: failed to assign [mem size
> 0x20000000]
> 
> [  365.539853] pci 0000:00:09.0: BAR 0: assigned [mem 0x10000000-0x10000fff]
> 
>  
> 
>  
> 
> Is 512MB too much for aarch64? What is the limit? I tried remapping
> another PCI device with only a single 16kB BAR to the aarch64 machine
> and that showed as disabled too. How do I enable the memory regions on
> aarch64? There are no drivers for these devices loaded yet on either x86
> or aarch64 yet the memory shows as enable don x86.

I have the following comments:

(1) generic -- please don't use -bios; use two explicit pflash drives
with aarch64 guests as well. Although "-bios + UEFI guest fw" is not as
broken with "qemu-system-aarch64 -M virt" as it is with
"qemu-system-x86-64 -M pc/q35", it is generally recommended to form a
good habit and to set up the guest with a working, persistent variable
store.

(2) device assignment is different on aarch64. I'm unsure if those
differences are restricted to KVM (that is, when you use an aarch64 host
with KVM, as opposed to an x86_64 host with TCG, for running the same
aarch64 guest, with the assigned device.) Anyway, I'll leave this link
here:
<http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/>,
and hope that Eric and Alex can clarify the question.

(3) You didn't say where you got your firmware binary. Recent builds of
the upstream ArmVirtQemu platform of edk2 utilize the 64-bit MMIO
aperture of the "virt" machine, for PCI BAR allocation
<https://bugzilla.tianocore.org/show_bug.cgi?id=65>.

The location of that aperture comes from QEMU, "hw/arm/virt.c":

    /* Second PCIe window, 512GB wide at the 512GB boundary */
    [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },

(4) You didn't capture the serial output of the VM while it was running
the firmware. Please do that. Unlike with OVMF (x86_64), the "virt"
(aarch64) machine type has no dedicated "QEMU debug port", so the
ArmVirtQemu fw debug log goes to the serial port.

Seeing the fw log could be helpful to determine if the PCI Bus driver in
the firmware manages to enumerate your assigned device. If it does, then
the problem is with the aarch64 guest kernel, I'd think.

(To my knowledge, the aarch64 guest kernel differs from the x86_64 guest
kernel in that it *always* re-enumerates the PCI hierarchy -- for now
--, regardless of what the firmware has done with PCI. This means that
even if the guest firmware succeeded with the enumeration / resource
allocation, any problem in the aarch64 guest kernel could mask that. So,
let's look at the fw log too.)

Thanks
Laszlo




More information about the vfio-users mailing list