[vfio-users] Ryzen Primary GPU passthrough success and woes

Graham Neville grahamneville at gmail.com
Wed Mar 29 21:44:06 UTC 2017


Thanks for the link. I've tried a number of things but still no further
down the line.

I've tried the following one-by-one

   - kvm_amd.avic=1
   - Properly isolated CPUs on host so they are exclusive to guest -
   isolcpus=0-7
   - Changed CPU host-passthrough to Opteron_G5 instead
   - Changed CPU to althon
   - Changed CPU to qemu64
   - kvm-amd.npt=0
   - iommu=pt
   - Disabled SMT in BIOS

In using Arch Linux with Kernel 4.10.1.

uname -a
Linux amdr7 4.10.5-1-ARCH #1 SMP PREEMPT Wed Mar 22 14:42:03 CET 2017
x86_64 GNU/Linux

This is my kernel command line now:

BOOT_IMAGE=/vmlinuz-linux root=UUID=bf69add2-e36f-453a-b92e-a4343ca20d26 rw
quiet amd_iommu=on vfio-pci.ids=1002:67b1,1002:aac8 video=efifb:off
amdgpu.msi=0 kvm_amd.avic=1 isolcpus=0-7 kvm-amd.npt=0 iommu=pt

This is my full libvirt XML file for the VM:


<domain type='kvm'>
  <name>windows10</name>
  <uuid>7b222825-fc7d-4a66-a72c-5876063752d5</uuid>
  <memory unit='KiB'>8291456</memory>
  <currentMemory unit='KiB'>8291456</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='3'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.1'>hvm</type>
    <loader type='pflash'
readonly='yes'>/home/virtualguests/windows10/OVMF_CODE.fd</loader>
    <nvram>/home/virtualguests/windows10/OVMF_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
 <features>
    <acpi/>
    <apic/>
    <pae/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
  </features>
  <cpus>
    <arch name='x86'>
      <model name='kvm64'>
        <feature name='apic'/>
        <feature name='clflush'/>
        <feature name='cmov'/>
        <feature name='cx16'/>
        <feature name='cx8'/>
        <feature name='de'/>
        <feature name='fpu'/>
        <feature name='fxsr'/>
        <feature name='lm'/>
        <feature name='mca'/>
        <feature name='mce'/>
        <feature name='mmx'/>
        <feature name='msr'/>
        <feature name='mtrr'/>
        <feature name='nx'/>
        <feature name='pae'/>
        <feature name='pat'/>
        <feature name='pge'/>
        <feature name='pni'/>
        <feature name='pse'/>
        <feature name='pse36'/>
        <feature name='sep'/>
        <feature name='sse'/>
        <feature name='sse2'/>
        <feature name='syscall'/>
        <feature name='tsc'/>
      </model>
    </arch>
  </cpus>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' />
      <source file='/home/virtualguests/windows10/windows10-c-nas.qcow2'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source
file='/home/virtualguests/windows10/Win10_1607_EnglishInternational_x64.iso'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='1' unit='0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/storage/windows10-d.qcow2'/>
      <target dev='vdc' bus='virtio'/>
    </disk>
    <controller type='pci' index='0' model='pci-root' />
    <interface type='bridge'>
      <mac address='52:54:00:12:34:76'/>
      <source bridge='br0'/>
      <target dev='tap8'/>
      <model type='virtio'/>
      <alias name='virtio'/>
      <rom bar='off'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc52e'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x28de'/>
        <product id='0x1142'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x0a12'/>
        <product id='0x0001'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x0fcf'/>
        <product id='0x1009'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1c0b'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='no'>
      <source>
        <vendor id='0x05e3'/>
        <product id='0x0608'/>
        <address bus='1' device='5'/>
      </source>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x0' />
      </source>
      <rom bar='on' file='/home/virtualguests/windows10/r9290.rom'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08'
function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09'
function='0x0'/>
    </hostdev>
   <memballoon model='none'/>
  </devices>
</domain>



When I have host-passthrough, Opteron_G5, althon or qemu64 CPUs configured
I see a lot of these stack traces just appearing frequently and not just
when the guest crashes, I see nothing when the guest crashes

[ 2848.156709] ------------[ cut here ]------------
[ 2848.156719] WARNING: CPU: 0 PID: 1445 at arch/x86/kvm/svm.c:1484
avic_vcpu_load+0x15a/0x180 [kvm_amd]
[ 2848.156720] Modules linked in: vhost_net vhost macvtap macvlan tun nfsv3
rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache hid_logitech_hidpp
usb_serial_simple cdc_acm usbserial hid_logitech_dj cfg80211 bridge stp llc
amdgpu sd_mod edac_mce_amd radeon edac_core kvm_amd kvm crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc ppdev ttm
snd_hda_codec_realtek snd_hda_codec_generic drm_kms_helper aesni_intel drm
btusb snd_hda_intel nls_iso8859_1 aes_x86_64 btrtl crypto_simd nls_cp437
btbcm glue_helper syscopyarea btintel sysfillrect snd_hda_codec vfat r8169
joydev sysimgblt fat fb_sys_fops i2c_algo_bit bluetooth cryptd evdev
mousedev uas mii input_leds snd_hda_core rfkill pcspkr led_class snd_hwdep
mac_hid snd_pcm snd_timer ccp sp5100_tco snd i2c_piix4 soundcore rng_core
shpchp wmi parport_pc
[ 2848.156772]  parport fjes 8250_dw i2c_designware_platform tpm_infineon
i2c_designware_core button acpi_cpufreq tpm_tis tpm_tis_core tpm nfsd
auth_rpcgss oid_registry nfs_acl lockd grace sch_fq_codel sunrpc ip_tables
x_tables ext4 crc16 jbd2 fscrypto mbcache usb_storage hid_generic usbhid
hid ahci libahci xhci_pci libata xhci_hcd usbcore scsi_mod nvme usb_common
nvme_core serio vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[ 2848.156798] CPU: 0 PID: 1445 Comm: CPU 0/KVM Tainted: G        W
4.10.5-1-ARCH #1
[ 2848.156798] Hardware name: Gigabyte Technology Co., Ltd. Default
string/AB350M-Gaming 3-CF, BIOS F2 02/20/2017
[ 2848.156799] Call Trace:
[ 2848.156807]  dump_stack+0x63/0x83
[ 2848.156812]  __warn+0xcb/0xf0
[ 2848.156816]  warn_slowpath_null+0x1d/0x20
[ 2848.156819]  avic_vcpu_load+0x15a/0x180 [kvm_amd]
[ 2848.156822]  svm_vcpu_unblocking+0x18/0x20 [kvm_amd]
[ 2848.156834]  kvm_vcpu_block+0xd3/0x330 [kvm]
[ 2848.156844]  ? kvm_get_rflags+0x1a/0x30 [kvm]
[ 2848.156856]  kvm_arch_vcpu_ioctl_run+0x4ea/0x1680 [kvm]
[ 2848.156859]  ? _copy_to_user+0x54/0x60
[ 2848.156867]  kvm_vcpu_ioctl+0x339/0x630 [kvm]
[ 2848.156872]  do_vfs_ioctl+0xa3/0x5f0
[ 2848.156876]  ? __fget+0x77/0xb0
[ 2848.156880]  SyS_ioctl+0x79/0x90
[ 2848.156883]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 2848.156885] RIP: 0033:0x7f9980dbd0d7
[ 2848.156886] RSP: 002b:00007f9972efb8e8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 2848.156887] RAX: ffffffffffffffda RBX: 00007f9987e0d001 RCX:
00007f9980dbd0d7
[ 2848.156888] RDX: 0000000000000000 RSI: 000000000000ae80 RDI:
0000000000000013
[ 2848.156888] RBP: 0000000000000001 R08: 000055c65eff4830 R09:
00000000000000ff
[ 2848.156889] R10: 0000000000000001 R11: 0000000000000246 R12:
0000000000000001
[ 2848.156889] R13: 00007f9987e0c000 R14: 0000000000000000 R15:
00007f99745a5980
[ 2848.156905] ---[ end trace e49522bc58864bce ]---

I still haven't tried SeaBIOS yet, I'm still running 'pc-i440fx-2.1' with
OVMF, not Q35. I couldn't get Windows to boot with Q35.

I also noticed something really odd in the fact that after the guest
crashes I see random pictures on the TV, which I assume are coming from
Arch - I see things like a woman at a football stadium and waterfalls - I'm
not sure if this would be expected if the card is assigned to vfio-pci?

@Steven Walter, can you paste a full copy of your libvirt XML file please?

Just for completeness here are my IOMMU groups:

[gneville at amdr7 ~]$ lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1450]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device [1022:1451]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1452]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1453]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1453]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1452]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1452]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1453]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1452]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1452]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1454]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1454]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus
Controller [1022:790b] (rev 59)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC
Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1460]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1461]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1462]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1463]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1464]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1465]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1467]
01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd
Device [144d:a804]
03:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device
[1022:43bb] (rev 02)
03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device
[1022:43b7] (rev 02)
03:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
[1022:43b2] (rev 02)
04:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
[1022:43b4] (rev 02)
04:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
[1022:43b4] (rev 02)
04:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
[1022:43b4] (rev 02)
05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev
0c)
07:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd
Device [144d:a804]
09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Hawaii PRO [Radeon R9 290/390] [1002:67b1]
09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii
HDMI Audio [Radeon R9 290/290X / 390/390X] [1002:aac8]
11:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc.
[AMD] Device [1022:145a]
11:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD]
Device [1022:1456]
11:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device
[1022:145c]
12:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc.
[AMD] Device [1022:1455]
12:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA
Controller [AHCI mode] [1022:7901] (rev 51)
12:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Device
[1022:1457]
[gneville at amdr7 ~]$

virsh nodedev-dumpxml pci_0000_09_00_0
<device>
  <name>pci_0000_09_00_0</name>
  <path>/sys/devices/pci0000:00/0000:00:03.1/0000:09:00.0</path>
  <parent>pci_0000_00_03_1</parent>
  <driver>
    <name>vfio-pci</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>9</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x67b1'>Hawaii PRO [Radeon R9 290/390]</product>
    <vendor id='0x1002'>Advanced Micro Devices, Inc. [AMD/ATI]</vendor>
    <iommuGroup number='2'>
      <address domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
      <address domain='0x0000' bus='0x09' slot='0x00' function='0x1'/>
      <address domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
      <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </iommuGroup>
  </capability>
</device>

find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/7/devices/0000:00:18.6
/sys/kernel/iommu_groups/7/devices/0000:00:18.4
/sys/kernel/iommu_groups/7/devices/0000:00:18.2
/sys/kernel/iommu_groups/7/devices/0000:00:18.0
/sys/kernel/iommu_groups/7/devices/0000:00:18.7
/sys/kernel/iommu_groups/7/devices/0000:00:18.5
/sys/kernel/iommu_groups/7/devices/0000:00:18.3
/sys/kernel/iommu_groups/7/devices/0000:00:18.1
/sys/kernel/iommu_groups/5/devices/0000:12:00.2
/sys/kernel/iommu_groups/5/devices/0000:00:08.1
/sys/kernel/iommu_groups/5/devices/0000:12:00.0
/sys/kernel/iommu_groups/5/devices/0000:12:00.3
/sys/kernel/iommu_groups/5/devices/0000:00:08.0
/sys/kernel/iommu_groups/3/devices/0000:00:04.0
/sys/kernel/iommu_groups/1/devices/0000:00:02.0
/sys/kernel/iommu_groups/6/devices/0000:00:14.0
/sys/kernel/iommu_groups/6/devices/0000:00:14.3
/sys/kernel/iommu_groups/4/devices/0000:11:00.2
/sys/kernel/iommu_groups/4/devices/0000:11:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:07.1
/sys/kernel/iommu_groups/4/devices/0000:11:00.3
/sys/kernel/iommu_groups/4/devices/0000:00:07.0
/sys/kernel/iommu_groups/2/devices/0000:00:03.0
/sys/kernel/iommu_groups/2/devices/0000:09:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:03.1
/sys/kernel/iommu_groups/2/devices/0000:09:00.0
/sys/kernel/iommu_groups/0/devices/0000:07:00.0
/sys/kernel/iommu_groups/0/devices/0000:03:00.1
/sys/kernel/iommu_groups/0/devices/0000:00:01.3
/sys/kernel/iommu_groups/0/devices/0000:04:01.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.1
/sys/kernel/iommu_groups/0/devices/0000:04:04.0
/sys/kernel/iommu_groups/0/devices/0000:05:00.0
/sys/kernel/iommu_groups/0/devices/0000:04:00.0
/sys/kernel/iommu_groups/0/devices/0000:03:00.2
/sys/kernel/iommu_groups/0/devices/0000:03:00.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/0/devices/0000:01:00.0









On Wed, Mar 29, 2017 at 12:24 PM, Steven Walter <stevenrwalter at gmail.com>
wrote:

> I  got a similar (though multi-GPU) setup working, which I wrote up
> here: https://www.reddit.com/r/VFIO/comments/616xih/gpu_
> passthrough_with_msi_b350_tomahawk/
>
> One thing that may help you is to enable AVIC (kvm_amd.avic=1).  What
> I saw without AVIC was that things would work briefly (only a few
> seconds for me) before interrupts would stop getting delivered.
> Sounds like things are working better for you without AVIC than they
> did for me, but perhaps the extra improvement in IRQ latency would fix
> the hangs you get during intensive graphics operations.
>
>
> On Tue, Mar 28, 2017 at 6:02 PM, Graham Neville <grahamneville at gmail.com>
> wrote:
> > I've managed to get pci-e passthough working on a gigabyte gaming 3 matx
> MB
> > and Ryzen 1700, no ACS patch, using only 1 GPU - AMD r9 290. However I'm
> > facing a problem with the whole KVM setup and not sure what it's related
> to.
> > For the Windows10 guest with the GPU passed through it crashes (guest
> only,
> > host is fine) whenever I try anything graphics intensive, for example
> > running Witcher3. Normal desktop is fine.
> > Also my Linux guests are acting odd when I try to SSH to them, I notice
> that
> > the SSH terminals just stop working randomly. And then there's the issue
> > with very slow network throughout to both VMs. I have no idea what's
> going
> > on. It used to work fine with my Intel setup. There's no logs in dmesg to
> > show a problem either.
> >
> > I'm going to try Seabios instead of OVMF to see if I can stop the
> crashing.
> >
> > Any one having similar issues or anyone can advise?
> >
> >
> > _______________________________________________
> > vfio-users mailing list
> > vfio-users at redhat.com
> > https://www.redhat.com/mailman/listinfo/vfio-users
> >
>
>
>
> --
> -Steven Walter <stevenrwalter at gmail.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20170329/45685363/attachment.htm>


More information about the vfio-users mailing list