[vfio-users] Failed to return from FLR on SuperMicro A2SDi-16C-HLN4F

Maik Broemme mbroemme at libmpq.org
Mon Nov 20 22:18:36 UTC 2017


Hi,

On Nov 17, 2017, at 09:33, Maik Broemme <mbroemme at libmpq.org> wrote:
> Hi,
> 
> I have a SuperMicro A2SDi-16C-HLN4F which uses recently released
> Denverton SoC (Intel Atom C3955). This mainboard has one PCI-E 3.0 x4
> slot, but whatever card is included there it doesn't work with VFIO.
> 
> 1. All tried cards work fine in another mainboard using VT-d and in
>    another mainboard using AMD-IOMMU.
> 
> 2. All tried cards report DPC events (AER fixed them). However using
>    them on host seems to work fine (tried it for some time now)
> 
> [29136.808030] dpc 0000:00:09.0:pcie010: DPC containment event, status:0x1f00 source:0x0000
> [29136.808045] pcieport 0000:00:09.0: AER: Corrected error received: id=0048
> [29136.808051] pcieport 0000:00:09.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0048(Transmitter ID)
> [29136.809533] pcieport 0000:00:09.0:   device [8086:19a4] error status/mask=00001000/00002000
> [29136.811079] pcieport 0000:00:09.0:    [12] Replay Timer Timeout
> 
> 00:09.0 is the PCI bridge and current device behind it is a Digital
> Devices GmbH Octopus DVB Adapter. The above error is what I see on host
> if using device there, as soon as I start using it vie VFIO I get the
> following:
> 
> Nov 17 05:06:13 server.theraso.int kernel: vfio-pci 0000:01:00.0: enabling device (0140 -> 0142)
> Nov 17 05:06:14 server.theraso.int kernel: vfio_bar_restore: 0000:01:00.0 reset recovery - restoring bars
> Nov 17 05:06:36 server.theraso.int kernel: vfio_bar_restore: 0000:01:00.0 reset recovery - restoring bars
> Nov 17 05:06:36 server.theraso.int kernel: vfio_bar_restore: 0000:01:00.0 reset recovery - restoring bars
> 
> Inside VM I get immediate after boot:
> 
> Nov 17 00:25:18 vdr.theraso.int kernel: Disabling IRQ #11
> Nov 17 00:25:18 vdr.theraso.int kernel: [<ffffffffc074d060>] qxl_irq_handler [qxl]
> Nov 17 00:25:18 vdr.theraso.int kernel: [<ffffffffc03a4570>] usb_hcd_irq [usbcore]
> Nov 17 00:25:18 vdr.theraso.int kernel: handlers:
> Nov 17 00:25:18 vdr.theraso.int kernel:  secondary_startup_64+0x9f/0x9f
> Nov 17 00:25:18 vdr.theraso.int kernel:  x86_64_start_kernel+0x13e/0x161
> Nov 17 00:25:18 vdr.theraso.int kernel:  x86_64_start_reservations+0x24/0x26
> Nov 17 00:25:18 vdr.theraso.int kernel:  ? early_idt_handler_array+0x120/0x120
> Nov 17 00:25:18 vdr.theraso.int kernel:  start_kernel+0x496/0x4b7
> Nov 17 00:25:18 vdr.theraso.int kernel:  rest_init+0xd5/0xe0
> Nov 17 00:25:18 vdr.theraso.int kernel:  cpu_startup_entry+0x73/0x80
> Nov 17 00:25:18 vdr.theraso.int kernel:  do_idle+0x175/0x1e0
> Nov 17 00:25:18 vdr.theraso.int kernel:  default_idle_call+0x23/0x30
> Nov 17 00:25:18 vdr.theraso.int kernel:  arch_cpu_idle+0xf/0x20
> Nov 17 00:25:18 vdr.theraso.int kernel:  default_idle+0x20/0x130
> ...
> Nov 17 00:25:18 vdr.theraso.int kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
> Nov 17 00:25:18 vdr.theraso.int kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C      4.13.5-1-ARCH #1
> Nov 17 00:25:18 vdr.theraso.int kernel: irq 11: nobody cared (try booting with the "irqpoll" option)
> 
> If I shutdown the VM, host puts device in a state which makes it not
> working anymore:
> 
> Nov 17 05:14:00 server.theraso.int kernel: vfio-pci 0000:01:00.0: Failed to return from FLR
> Nov 17 05:13:58 server.theraso.int kernel: vfio-pci 0000:01:00.0: timed out waiting for pending transaction; performing function level reset anyway
> 
> Next VM start:
> 
> Nov 17 00:28:22 server.theraso.int kernel: vfio-pci 0000:01:00.0: Refused to change power state, currently in D3
> 
> Moreover I've tried this all already with a RealTek RTL-8169 NIC. The
> issue remains the same. As mentioned in the beginning the devices works
> fine on other boards.
> 
> Any help would be much appreciated to narrow down the problem. The DPC
> events occurs also in case of not using VFIO at all.
> 

I've debugged this problem with a "Digital Devices Octopus DVB Adapter" and
tried latest git kernel with PCI changes for v4.15.

1. The device state after host boot.

root at server:~# lspci -vvv -s 01:00.0
01:00.0 Multimedia controller: Digital Devices GmbH Octopus DVB Adapter
	Subsystem: Digital Devices GmbH Cine S2 V6.5 DVB adapter
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 255
	Region 0: Memory at dfc00000 (64-bit, non-prefetchable) [disabled] [size=64K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D3 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [70] MSI: Enable- Count=1/2 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [90] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s unlimited
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range A, TimeoutDis+, LTR-, OBFF Not Supported
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Vendor Specific Information: ID=0000 Rev=0 Len=00c <?>
	Kernel driver in use: vfio-pci
	Kernel modules: ddbridge

2. Finding the PCI bridge:

root at server:~# lspci -vt
-[0000:00]-+-00.0  Intel Corporation Device 1980
           +-04.0  Intel Corporation Device 19a1
           +-05.0  Intel Corporation Device 19a2
           +-09.0-[01]----00.0  Digital Devices GmbH Octopus DVB Adapter
           +-10.0-[02]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
           +-11.0-[03-04]----00.0-[04]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family
           +-12.0  Intel Corporation DNV SMBus Contoller - Host
           +-13.0  Intel Corporation DNV SATA Controller 0
           +-14.0  Intel Corporation DNV SATA Controller 1
           +-15.0  Intel Corporation Device 19d0
           +-16.0-[05-06]--+-00.0  Intel Corporation Ethernet Connection X553 1GbE
           |               +-00.1  Intel Corporation Ethernet Connection X553 1GbE
           |               +-10.0  Intel Corporation X553 Virtual Function
           |               +-10.1  Intel Corporation X553 Virtual Function
           |               +-10.2  Intel Corporation X553 Virtual Function
           |               +-10.3  Intel Corporation X553 Virtual Function
           |               +-10.4  Intel Corporation X553 Virtual Function
           |               +-10.5  Intel Corporation X553 Virtual Function
           |               +-10.6  Intel Corporation X553 Virtual Function
           |               +-10.7  Intel Corporation X553 Virtual Function
           |               +-11.0  Intel Corporation X553 Virtual Function
           |               +-11.1  Intel Corporation X553 Virtual Function
           |               +-11.2  Intel Corporation X553 Virtual Function
           |               +-11.3  Intel Corporation X553 Virtual Function
           |               +-11.4  Intel Corporation X553 Virtual Function
           |               +-11.5  Intel Corporation X553 Virtual Function
           |               +-11.6  Intel Corporation X553 Virtual Function
           |               \-11.7  Intel Corporation X553 Virtual Function
           +-17.0-[07-08]--+-00.0  Intel Corporation Ethernet Connection X553 1GbE
           |               +-00.1  Intel Corporation Ethernet Connection X553 1GbE
           |               +-10.0  Intel Corporation X553 Virtual Function
           |               +-10.1  Intel Corporation X553 Virtual Function
           |               +-10.2  Intel Corporation X553 Virtual Function
           |               +-10.4  Intel Corporation X553 Virtual Function
           |               +-10.6  Intel Corporation X553 Virtual Function
           |               +-11.0  Intel Corporation X553 Virtual Function
           |               +-11.2  Intel Corporation X553 Virtual Function
           |               +-11.4  Intel Corporation X553 Virtual Function
           |               \-11.6  Intel Corporation X553 Virtual Function
           +-18.0  Intel Corporation Device 19d3
           +-1f.0  Intel Corporation DNV LPC or eSPI
           +-1f.2  Intel Corporation Device 19de
           +-1f.4  Intel Corporation DNV SMBus controller
           \-1f.5  Intel Corporation DNV SPI Controller

3. Secondary bus reset:

root at server:~# setpci -s 0000:00:09.0 BRIDGE_CONTROL=40:40

4. Clearing:

root at server:~# setpci -s 0000:00:09.0 BRIDGE_CONTROL=00:40

5. Checking if device is still functional:

root at server:~# lspci -vvv -s 01:00.0
01:00.0 Multimedia controller: Digital Devices GmbH Octopus DVB Adapter (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel driver in use: vfio-pci
	Kernel modules: ddbridge

It looks like device has been disappeared from the PCI bridge / bus. This
is very strange and should probably not happen. This is exactly the same
lspci output as I get after starting the VM with device passthrough. The
same issue can be reproduced with RealTek RTL8111D NIC.

However both cards doing passthrough fine on an Opteron Mainboard and an
ASrock mainboard with Skylake Core i5. Meanwhile I've opened a case for it
at SuperMicro but it is not clear if it is an EFI/BIOS issue.

Also this problem looks similar to this one many people have with VFIO
passthrough on Ryzen/Threadripper but my system is Intel Atom C3xxx based.
Anybody any ideas?

> --Maik

--Maik




More information about the vfio-users mailing list