[vfio-users] Nested VFIO with QEMU

Alex Williamson alex.williamson at redhat.com
Tue Nov 5 23:58:34 UTC 2019


On Wed, 6 Nov 2019 00:29:52 +0100
Samuel Ortiz <samuel.ortiz at intel.com> wrote:

> On Tue, Nov 05, 2019 at 01:21:48PM -0700, Alex Williamson wrote:
> > On Fri, 18 Oct 2019 05:48:49 +0000
> > "Boeuf, Sebastien" <sebastien.boeuf at intel.com> wrote:
> >   
> > > Hi folks,
> > > 
> > > I have been recently working with VFIO, and particularly trying to
> > > achieve device passthrough through multiple layers of virtualization.
> > > 
> > > I wanted to assess QEMU's performances with nested VFIO, using the
> > > emulated Intel IOMMU device. Unfortunately, I cannot make any of my
> > > physical device work when I pass them through, attached to the emulated
> > > Intel IOMMU. Using regular VFIO works properly, but as soon as I enable
> > > the virtual IOMMU, the driver fails to probe (I tried on two different
> > > machines with different types of NIC).
> > > 
> > > So I was wondering if someone was aware of any issue with using both
> > > VFIO and virtual Intel IOMMU with QEMU? I'm sure I might be missing
> > > something obvious but I couldn't find it so far.  
> > 
> > It's not something I test regularly, but I'm under the impression that
> > nested device assignment does work.  When you say the driver fails to
> > probe, which driver is that, the endpoint driver in the L2 guest or
> > vfio-pci in the L1 guest?  Perhaps share your XML or command line?  
> 
> This is fixed now. Apparently the iommu device needs to be passed
> _before_ the other devices on the command line. We managed to make it
> work as expected.

Good news!

> Sebastien and Yi Liu figured this out but for some reasons the
> thread moved to vfio-users-bounces at redhat.com.

Yes, I see some uncaught bounce notifications, it looks like Yi's
initial reply was to vfio-users-bounces.  Yi, you might want to
checkout your mailer configuration.  For posterity/follow-up, I'll
paste the final message from the bounce notification below.  Thanks,

Alex

On Mon, 28 Oct 2019 08:13:23 +0000
"Liu, Yi L" <yi.l.liu at intel.com> wrote:

> Hi Sebastien,
> 
> That’s great it works for you. I remember there was an effort
> to fix it in community. But I cannot recall if it was documented.
> If not, I think I can co-work with community to make it clear.
> 
> Regards,
> Yi Liu
> 
> From: Boeuf, Sebastien
> Sent: Friday, October 25, 2019 7:17 PM
> To: Liu, Yi L <yi.l.liu at intel.com>
> Cc: Ortiz, Samuel <samuel.ortiz at intel.com>; vfio-users-bounces at redhat.com; Bradford, Robert <robert.bradford at intel.com>
> Subject: Re: [vfio-users] Nested VFIO with QEMU
> 
> Hi Yi Liu,
> 
> Yes that was it :)
> Thank you very much for your help!
> 
> Is it documented somewhere that parameters order matters?
> 
> Thanks,
> Sebastien
> 
> On Fri, 2019-10-25 at 09:52 +0800, Liu, Yi L wrote:
> Hi Sebastien,
> 
> I guess the cmdline is cause. You should put the intel-iommu exposure prior to other devices as below.
> 
> -drive if=none,id=drive0,format=raw,file=/home/sebastien/clear-kvm.img \
> -device intel-iommu,intremap=on,caching-mode=on
> -device virtio-blk-pci,drive=drive0,scsi=off \
> -device virtio-rng-pci \
> -device vfio-pci,host=00:19.0 \
> 
> Regards,
> Yi Liu
> 
> From: Boeuf, Sebastien
> Sent: Friday, October 25, 2019 7:14 AM
> To: Liu, Yi L <yi.l.liu at intel.com<mailto:yi.l.liu at intel.com>>
> Cc: Ortiz, Samuel <samuel.ortiz at intel.com<mailto:samuel.ortiz at intel.com>>; vfio-users-bounces at redhat.com<mailto:vfio-users-bounces at redhat.com>; Bradford, Robert <robert.bradford at intel.com<mailto:robert.bradford at intel.com>>
> Subject: Re: [vfio-users] Nested VFIO with QEMU
> 
> Hi Yi Liu,
> 
> On Tue, 2019-10-22 at 11:01 +0800, Liu, Yi L wrote:
> 
> Hi Sebastien,
> 
> 
> 
> > From: vfio-users-bounces at redhat.com<mailto:vfio-users-bounces at redhat.com> [mailto:vfio-users-  
> 
> > bounces at redhat.com<mailto:bounces at redhat.com>] On Behalf Of Boeuf, Sebastien  
> 
> > Sent: Friday, October 18, 2019 1:49 PM  
> 
> > To: vfio-users at redhat.com<mailto:vfio-users at redhat.com>  
> 
> > Cc: Ortiz, Samuel <samuel.ortiz at intel.com<mailto:samuel.ortiz at intel.com>>; Bradford, Robert  
> 
> > <robert.bradford at intel.com<mailto:robert.bradford at intel.com>>  
> 
> > Subject: [vfio-users] Nested VFIO with QEMU  
> 
> >  
> 
> > Hi folks,  
> 
> >  
> 
> > I have been recently working with VFIO, and particularly trying to  
> 
> > achieve device passthrough through multiple layers of virtualization.  
> 
> >  
> 
> > I wanted to assess QEMU's performances with nested VFIO, using the  
> 
> > emulated Intel IOMMU device. Unfortunately, I cannot make any of my  
> 
> > physical device work when I pass them through, attached to the  
> 
> > emulated Intel IOMMU. Using regular VFIO works properly, but as soon  
> 
> 
> 
> Sorry, what does regular VFIO mean here?
> 
> Sorry, what I called "regular VFIO" is for the case where VFIO is not run along with
> vIOMMU.
> 
> 
> 
> 
> 
> 
> > as I enable the virtual IOMMU, the driver fails to probe (I tried on  
> 
> > two different machines with different types of NIC).  
> 
> 
> 
> Ok, so regular VFIO means passthru a device to a VM which has no vIOMMU?
> 
> Yes.
> 
> 
> 
> 
> 
> 
> > So I was wondering if someone was aware of any issue with using both  
> 
> > VFIO and virtual Intel IOMMU with QEMU? I'm sure I might be missing  
> 
> > something obvious but I couldn't find it so far.  
> 
> 
> 
> I’ve been using VFIO and vIOMMU for a long time, so far it is pretty stable
> 
> for me. I would be pleased to help here. Could you paste your QEMU
> 
> cmdline? And it would also be helpful to paste the error log you got when
> 
> the failure happened.
> 
> So here is the QEMU command line I am using:
> 
> qemu-system-x86_64 \
> -machine q35,accel=kvm,kernel_irqchip=split \
> -bios /home/sebastien/workloads/OVMF.fd \
> -smp sockets=1,cpus=1,cores=1 \
> -cpu host \
> -m 1024 \
> -vga none \
> -nographic \
> -kernel ~/bzImage \
> -append "console=ttyS0 reboot=k root=/dev/vda3 kvm-intel.nested=1 vfio_iommu_type1.allow_unsafe_interrupts intel_iommu=on rw" \
> -drive if=none,id=drive0,format=raw,file=/home/sebastien/clear-kvm.img \
> -device virtio-blk-pci,drive=drive0,scsi=off \
> -device virtio-rng-pci \
> -device vfio-pci,host=00:19.0 \
> -device intel-iommu,intremap=on,caching-mode=on
> 
> My goal being to simply pass the device which is a fairly simple Intel NIC into the guest.
> Unfortunately, after the VM boots, I can see the interface going up and down. It basically
> keeps resetting after I got the following trace after a few seconds:
> 
> [   14.223213] NETDEV WATCHDOG: enp0s3 (e1000e): transmit queue 0 timed out
> [   14.224543] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x200/0x210
> [   14.224543] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc3+ #169
> [   14.224543] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [   14.224543] RIP: 0010:dev_watchdog+0x200/0x210
> [   14.224543] Code: 00 49 63 4e e8 eb 98 4c 89 ef c6 05 ba d5 95 00 01 e8 f4 e4 fc ff 89 d9 4c 89 ee 48 c7 c7 98 1e ea 81 48 89 c2 e8 05 35 9d ff <0f> 0b eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 90 55 48 89 e5 41 57
> [   14.224543] RSP: 0018:ffffc90000003e88 EFLAGS: 00010282
> [   14.224543] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000083f
> [   14.224543] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
> [   14.224543] RBP: ffffc90000003eb8 R08: 00000000000001ee R09: ffffffff8227ee38
> [   14.224543] R10: 000000000000004c R11: ffffc90000003ce8 R12: 0000000000000001
> [   14.224543] R13: ffff88803bcb0000 R14: ffff88803bcb03b8 R15: ffff88803bc92680
> [   14.224543] FS:  0000000000000000(0000) GS:ffff88803f400000(0000) knlGS:0000000000000000
> [   14.224543] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   14.224543] CR2: 00007fc745758af0 CR3: 000000003c9f4002 CR4: 00000000000606b0
> [   14.224543] Call Trace:
> [   14.224543]  <IRQ>
> [   14.224543]  ? pfifo_fast_enqueue+0x130/0x130
> [   14.224543]  call_timer_fn.isra.30+0x16/0x80
> [   14.224543]  run_timer_softirq+0x323/0x360
> [   14.224543]  ? clockevents_program_event+0x8e/0xf0
> [   14.224543]  __do_softirq+0xcf/0x21e
> [   14.224543]  irq_exit+0x9e/0xa0
> [   14.224543]  smp_apic_timer_interrupt+0x66/0xa0
> [   14.224543]  apic_timer_interrupt+0xf/0x20
> [   14.224543]  </IRQ>
> [   14.224543] RIP: 0010:default_idle+0x12/0x20
> [   14.224543] Code: 48 83 c0 22 48 89 44 24 28 eb c7 e8 48 fb 90 ff 90 90 90 90 90 90 90 90 55 48 89 e5 e9 07 00 00 00 0f 00 2d e2 67 41 00 fb f4 <5d> c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 55 65 48 8b 04 25 40 5d
> [   14.224543] RSP: 0018:ffffffff82003e40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> [   14.224543] RAX: ffffffff817f7f10 RBX: 0000000000000000 RCX: 0000000000000001
> [   14.224543] RDX: 0000000000001a86 RSI: 0000000000000087 RDI: ffff88803f41c700
> [   14.224543] RBP: ffffffff82003e40 R08: 0000000000018470 R09: ffff88803f41f4c0
> [   14.224543] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff820b5590
> [   14.224543] R13: 0000000000000000 R14: 0000000000000000 R15: 000000003e80f000
> [   14.224543]  ? __cpuidle_text_start+0x8/0x8
> [   14.224543]  arch_cpu_idle+0x10/0x20
> [   14.224543]  default_idle_call+0x21/0x30
> [   14.224543]  do_idle+0x1d5/0x1f0
> [   14.224543]  cpu_startup_entry+0x18/0x20
> [   14.224543]  rest_init+0xa9/0xab
> [   14.224543]  arch_call_rest_init+0x9/0xc
> [   14.224543]  start_kernel+0x451/0x470
> [   14.224543]  x86_64_start_reservations+0x29/0x2b
> [   14.224543]  x86_64_start_kernel+0x71/0x74
> [   14.224543]  secondary_startup_64+0xa4/0xb0
> [   14.224543] ---[ end trace f8ed580b43c5ffcc ]---
> [   14.224543] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly
> 
> And here the logs about the reset happening:
> 
> [   20.427211] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   25.996633] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly
> [   32.518501] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   38.028584] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly
> [   44.579713] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   50.060585] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly
> [   56.659831] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   62.092632] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly
> [   68.718399] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   74.124622] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly
> [   80.798753] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> [   86.156624] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly
> 
> Thanks,
> Sebastien
> 
> 
> >  
> 
> > Thanks,  
> 
> > Sebastien  
> 
> >  
> 
> Best Wishes,
> Yi Liu
> 





More information about the vfio-users mailing list