[vfio-users] VFIO-PCI with AARCH64 QEMU

Alex Williamson alex.williamson at redhat.com
Tue Oct 25 16:31:16 UTC 2016


On Tue, 25 Oct 2016 11:35:13 +0200
Laszlo Ersek <lersek at redhat.com> wrote:

> CC Ard and Eric (Alex will see this anyway)
> 
> On 10/25/16 02:29, Haynal, Steve wrote:
> > Hi,
> > 
> >  
> > 
> > I am using VFIO-PCI to pass through a PCIe endpoint on an FPGA card to
> > virtual x86 and aarch64 QEMU instances. It works fine on x86 but I have
> > problems with aarch64.  On aarch64, memory for both BARs on the device
> > shows up as disabled, and one BAR is ignored. From this presentation
> > <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwi-vvyD1_TPAhVIy2MKHZmCCt4QFggcMAA&url=https%3A%2F%2Fwww.linux-kvm.org%2Fimages%2Fb%2Fb4%2F2012-forum-VFIO.pdf&usg=AFQjCNFpT9XRMMrhxR0U7_y1tIqXNp13ww&sig2=-4egMzF5EO-DRy5BJizgvg&bvm=bv.136593572,d.cGc>,
> > I’m under the impression that I can remap PCI devices without KVM to a
> > virtual aarch64 machine on an x86 host. Is this possible? Below are
> > details for my problem:
> > 
> >  
> > 
> > Host is running Ubuntu Xenial, 4.4 Kernel.
> > 
> > Both virtual machines are also Ubuntu Xenial, 4.4 Kernel, from Ubuntu
> > Cloud Images. Except for architecture, they are pretty much identical.
> > 
> > Qemu is 2.7.50 from git.
> > 
> >  
> > 
> > Command line to start qemu for x86 machine:
> > 
> > qemu-system-x86_64 -enable-kvm -net nic -net user -hda disk.img -hdb
> > my-seed.img -m 1024 -smp 2 -device
> > vfio-pci,host=01:00.0,addr=09.0,multifunction=on -redir tcp:2223::22
> > 
> >  
> > 
> > Lspci –vvv for remapped device in x86 machine. Note both regions are
> > enabled.
> > 
> > 00:09.0 Memory controller: Xilinx Corporation Device 7022
> > 
> >                Subsystem: Xilinx Corporation Device 0007
> > 
> >                Physical Slot: 9
> > 
> >                Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV-
> > VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
> > 
> >                Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast  
> >>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-  
> > 
> >                Interrupt: pin A routed to IRQ 10
> > 
> >                Region 0: Memory at e0071000 (32-bit, non-prefetchable)
> > [size=4K]
> > 
> >                Region 1: Memory at c0000000 (32-bit, non-prefetchable)
> > [size=512M]
> > 
> >                Capabilities: [80] Power Management version 3
> > 
> >                               Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > PME(D0-,D1-,D2-,D3hot-,D3cold-)
> > 
> >                               Status: D0 NoSoftRst+ PME-Enable- DSel=0
> > DScale=0 PME-
> > 
> >                Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+
> > 
> >                               Address: 0000000000000000  Data: 0000
> > 
> >                Capabilities: [c0] Express (v2) Endpoint, MSI 00
> > 
> >                               DevCap:               MaxPayload 512
> > bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> > 
> >                                              ExtTag- AttnBtn- AttnInd-
> > PwrInd- RBE+ FLReset-
> > 
> >                               DevCtl:  Report errors: Correctable-
> > Non-Fatal+ Fatal+ Unsupported+
> > 
> >                                              RlxdOrd+ ExtTag- PhantFunc-
> > AuxPwr- NoSnoop+
> > 
> >                                              MaxPayload 256 bytes,
> > MaxReadReq 512 bytes
> > 
> >                               DevSta: CorrErr- UncorrErr- FatalErr-
> > UnsuppReq- AuxPwr- TransPend-
> > 
> >                               LnkCap: Port #0, Speed 2.5GT/s, Width x8,
> > ASPM not supported, Exit Latency L0s unlimited, L1 unlimited
> > 
> >                                              ClockPM- Surprise-
> > LLActRep- BwNot- ASPMOptComp+
> > 
> >                               LnkCtl:   ASPM Disabled; RCB 64 bytes
> > Disabled- CommClk+
> > 
> >                                              ExtSynch- ClockPM-
> > AutWidDis- BWInt- AutBWInt-
> > 
> >                               LnkSta:  Speed 2.5GT/s, Width x8, TrErr-
> > Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > 
> >                               DevCap2: Completion Timeout: Range B,
> > TimeoutDis+, LTR-, OBFF Not Supported
> > 
> >                               DevCtl2: Completion Timeout: 65ms to
> > 210ms, TimeoutDis-, LTR-, OBFF Disabled
> > 
> >                               LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete-, EqualizationPhase1-
> > 
> >                                              EqualizationPhase2-,
> > EqualizationPhase3-, LinkEqualizationRequest-
> > 
> >  
> > 
> >  
> > 
> > Command line to start qemu for aarch64 machine:
> > 
> > qemu-system-aarch64 -smp 2 -m 2048 -M virt -bios QEMU_EFI.fd -device
> > virtio-blk-device,drive=image -drive if=none,id=image,file=disk.img
> > -device virtio-blk-device,drive=cloud -drive
> > if=none,id=cloud,file=cloud.img -netdev user,id=user0 -device
> > virtio-net-device,netdev=user0 -redir tcp:2222::22 -cpu cortex-a57
> > -device vfio-pci,host=01:00.0,addr=09.0,multifunction=on
> > 
> >  
> > 
> > Lspci –vvv for remapped device in aarch64 machine. Note both regions are
> > disabled, one is ignored.
> > 
> > 00:09.0 Memory controller: Xilinx Corporation Device 7022
> > 
> >                Subsystem: Xilinx Corporation Device 0007
> > 
> >                Control: I/O- Mem- BusMaster- SpecCycle- MemWINV-
> > VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > 
> >                Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast  
> >>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-  
> > 
> >                Interrupt: pin A routed to IRQ 47
> > 
> >                Region 0: Memory at 10000000 (32-bit, non-prefetchable)
> > [disabled] [size=4K]
> > 
> >                Region 1: Memory at <ignored> (32-bit, non-prefetchable)
> > [disabled]
> > 
> >                Capabilities: [80] Power Management version 3
> > 
> >                               Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > PME(D0-,D1-,D2-,D3hot-,D3cold-)
> > 
> >                               Status: D0 NoSoftRst+ PME-Enable- DSel=0
> > DScale=0 PME-
> > 
> >                Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+
> > 
> >                               Address: 0000000000000000  Data: 0000
> > 
> >                Capabilities: [c0] Express (v2) Root Complex Integrated
> > Endpoint, MSI 00
> > 
> >                               DevCap:               MaxPayload 512
> > bytes, PhantFunc 0
> > 
> >                                              ExtTag- RBE+
> > 
> >                               DevCtl:  Report errors: Correctable-
> > Non-Fatal+ Fatal+ Unsupported+
> > 
> >                                              RlxdOrd+ ExtTag- PhantFunc-
> > AuxPwr- NoSnoop+
> > 
> >                                              MaxPayload 256 bytes,
> > MaxReadReq 512 bytes
> > 
> >                               DevSta: CorrErr- UncorrErr- FatalErr-
> > UnsuppReq- AuxPwr- TransPend-
> > 
> >                               DevCap2: Completion Timeout: Range B,
> > TimeoutDis+, LTR-, OBFF Not Supported
> > 
> >                               DevCtl2: Completion Timeout: 65ms to
> > 210ms, TimeoutDis-, LTR-, OBFF Disabled
> > 
> >                Capabilities: [100 v2] Advanced Error Reporting
> > 
> >                               UESta:   DLP- SDES- TLP- FCP- CmpltTO-
> > CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > 
> >                               UEMsk: DLP- SDES- TLP- FCP- CmpltTO-
> > CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > 
> >                               UESvrt:  DLP+ SDES+ TLP+ FCP+ CmpltTO+
> > CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > 
> >                               CESta:   RxErr- BadTLP- BadDLLP- Rollover-
> > Timeout- NonFatalErr-
> > 
> >                               CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+
> > Timeout+ NonFatalErr+
> > 
> >                               AERCap:               First Error Pointer:
> > 00, GenCap- CGenEn- ChkCap- ChkEn-
> > 
> >  
> > 
> >  
> > 
> > On the aarch64 machine, when I rescan the PCI bus, I see the following
> > in dmesg:
> > 
> > [  365.482929] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
> > 
> > [  365.521023] pci 0000:00:09.0: [10ee:7022] type 00 class 0x058000
> > 
> > [  365.522971] pci 0000:00:09.0: reg 0x10: [mem 0x10000000-0x10000fff]
> > 
> > [  365.523213] pci 0000:00:09.0: reg 0x14: [mem 0x80000000-0x9fffffff]
> > 
> > [  365.539233] pci 0000:00:09.0: BAR 1: no space for [mem size 0x20000000]
> > 
> > [  365.539406] pci 0000:00:09.0: BAR 1: failed to assign [mem size
> > 0x20000000]
> > 
> > [  365.539853] pci 0000:00:09.0: BAR 0: assigned [mem 0x10000000-0x10000fff]
> > 
> >  
> > 
> >  
> > 
> > Is 512MB too much for aarch64? What is the limit? I tried remapping
> > another PCI device with only a single 16kB BAR to the aarch64 machine
> > and that showed as disabled too. How do I enable the memory regions on
> > aarch64? There are no drivers for these devices loaded yet on either x86
> > or aarch64 yet the memory shows as enable don x86.  
> 
> I have the following comments:
> 
> (1) generic -- please don't use -bios; use two explicit pflash drives
> with aarch64 guests as well. Although "-bios + UEFI guest fw" is not as
> broken with "qemu-system-aarch64 -M virt" as it is with
> "qemu-system-x86-64 -M pc/q35", it is generally recommended to form a
> good habit and to set up the guest with a working, persistent variable
> store.
> 
> (2) device assignment is different on aarch64. I'm unsure if those
> differences are restricted to KVM (that is, when you use an aarch64 host
> with KVM, as opposed to an x86_64 host with TCG, for running the same
> aarch64 guest, with the assigned device.) Anyway, I'll leave this link
> here:
> <http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/>,
> and hope that Eric and Alex can clarify the question.

That largely discusses MSI, which is relevant for an aarch64 host.
With an x86 host, MSIs are transparently mapped through the interrupt
remapper and I don't see why they shouldn't "just work" in the guest
since they take a route through QEMU, where I presume QEMU knows how to
inject MSI interrupts into a guest.

> (3) You didn't say where you got your firmware binary. Recent builds of
> the upstream ArmVirtQemu platform of edk2 utilize the 64-bit MMIO
> aperture of the "virt" machine, for PCI BAR allocation
> <https://bugzilla.tianocore.org/show_bug.cgi?id=65>.
> 
> The location of that aperture comes from QEMU, "hw/arm/virt.c":
> 
>     /* Second PCIe window, 512GB wide at the 512GB boundary */
>     [VIRT_PCIE_MMIO_HIGH] =   { 0x8000000000ULL, 0x8000000000ULL },

Note that the BARs for the device in question are 32-bit,
non-prefetchable.  512MB is a rather large BAR to force into a 32-bit
address space, but clearly and x86 guest can handle it, perhaps only
one thought.

Beyond that, I have no idea how aarch64 reserves MMIO space and
communicates that to the guest.  I'd expect the standard would be
ACPI with a _CRS method, but then ARM always seems to throw a curve
ball with device tree.
 
> (4) You didn't capture the serial output of the VM while it was running
> the firmware. Please do that. Unlike with OVMF (x86_64), the "virt"
> (aarch64) machine type has no dedicated "QEMU debug port", so the
> ArmVirtQemu fw debug log goes to the serial port.
> 
> Seeing the fw log could be helpful to determine if the PCI Bus driver in
> the firmware manages to enumerate your assigned device. If it does, then
> the problem is with the aarch64 guest kernel, I'd think.
> 
> (To my knowledge, the aarch64 guest kernel differs from the x86_64 guest
> kernel in that it *always* re-enumerates the PCI hierarchy -- for now
> --, regardless of what the firmware has done with PCI. This means that
> even if the guest firmware succeeded with the enumeration / resource
> allocation, any problem in the aarch64 guest kernel could mask that. So,
> let's look at the fw log too.)
> 
> Thanks
> Laszlo
> 
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users





More information about the vfio-users mailing list