[vfio-users] Poor performance with nvidia GTX 980

Georgios Kourachanis geo.kourachanis at gmail.com
Tue Nov 3 15:35:22 UTC 2015


Hello Okky,

To be honest, I don't play any games at the time. But I'd say Tomb 
Raider and Bioshock Infinite if I were to play something else. Or maybe 
Crysis 3.

I guess our hosts are very similar, as I have an ASRock Z77 extreme4 and 
the i7 3770.
I've just installed linux-vfio-lts 4.1.12, too. It actually fixed some 
crashes of my host whenever I was shutting down the guest.

I've tried without pinning the CPU, and use your topology, and I got a 
tiny boost in GPU performance, but significant lower CPU performance 
(about -20%) all tested with passmark.

I'll try Unigiene Valley benchmark tonight I think. Thanks for the 
suggestion.


George


On 02/11/2015 04:16 μμ, Okky Hendriansyah wrote:
> I think the best benchmark would be the in-game ones. Since I cannot 
> try Tera (region restricted), what are the other games that you play? 
> Hopefully I can try to benchmark that on my VM and start from there. 
> You can also try the Unigiene Valley benchmark.
>
> My host has an ASRock Z87 Extreme6 with Intel Core i7-4770, patched my 
> kernel with ACS Override patch, currently at kernel 4.1.2-lts (patched 
> from ABS PKGBUILD). I do not isolate any cores on the host and give 
> all the cores the VM (exposed as 8 vcpus with topology 1 socket, 4 
> cores, each core 2 threads)
>
> Best regards,
> Okky Hendriansyah
>
> On Nov 2, 2015, at 19:54, Eddie Yen <missile0407 at gmail.com 
> <mailto:missile0407 at gmail.com>> wrote:
>
>> For now the latest driver is 358.50, and my guest using latest driver 
>> wo any problem.
>> But I'm using the method that AW talking about, so maybe give it a try?
>>
>> 2015-11-02 20:47 GMT+08:00 Georgios Kourachanis 
>> <geo.kourachanis at gmail.com <mailto:geo.kourachanis at gmail.com>>:
>>
>>     It's the same thing, either by adding them with qemuarguments, or
>>     with the wrapper.
>>
>>     The thing is to use the hyper-v functions. That's what the
>>     hyper-v vendor-id patch has given to us. The ability of hidding
>>     the hyper-v functions from nvidia GPUs so that we can use them!
>>
>>     Also, I've tried with a null name for the vendor-id, I got the
>>     same performance.
>>
>>     The nvidia drivers I'm currently using are 358.50
>>
>>     Moreover, could you suggest me a nice software to test the VM's
>>     performance in general? I don't really like passmark.
>>
>>
>>
>>
>>     On 02/11/2015 02:11 μμ, Eddie Yen wrote:
>>>     OK, but I still suggest that remove Hyper-V function tags in
>>>     your XML.
>>>     Because we don't know about what new tricks inside the driver
>>>     that NVIDIA wants to "surprise" us.
>>>
>>>     For me, my GTX980 works well by using upon edits. But I'm using
>>>     4820K which didn't need ACS patch and wo intel graphic.
>>>     So I'm not sure it may cause by patch or sth.
>>>
>>>     2015-11-02 20:04 GMT+08:00 Georgios Kourachanis
>>>     <geo.kourachanis at gmail.com <mailto:geo.kourachanis at gmail.com>>:
>>>
>>>         Hello Eddie,
>>>
>>>         Thanks for answering, though:
>>>
>>>         What you suggest me to do, I've already done it with this way:
>>>
>>>         /usr/local/bin/qemu-system-x86_64.hv:
>>>         #!/bin/sh
>>>         exec /usr/bin/qemu-system-x86_64 `echo "\$@" | \
>>>         sed 's|hv_time|hv_time,hv_vendor_id=GoobyPLS|g'
>>>
>>>
>>>         and by changing the emulator qemu to this line:
>>>
>>>         <emulator>/usr/local/bin/qemu-system-x86_64.hv</emulator>
>>>
>>>         I'm just giving the ID "GoobyPLS" to the vendor. I'll try
>>>         without a vendor name to see if it changes anything.
>>>
>>>         Also, I'm using the qemu git version "r41983.g3a958f5" so it
>>>         already contains the patch that helps us use the lines above.
>>>
>>>
>>>
>>>
>>>
>>>         On 02/11/2015 03:53 πμ, Eddie Yen wrote:
>>>>         According from AW's blog:
>>>>         " For this step we again need to run virsh edit on the VM. 
>>>>         Within the<features> section, remove everything between the
>>>>         <hyperv> tags, including the tags themselves.0"
>>>>         and
>>>>         "Additionally, within the <clock> tag, find the timer named
>>>>         hypervclock, remove the line containing this tag
>>>>         completely.  Save and exit the edit session."
>>>>
>>>>         I found that these still exist in your XML file, so try to
>>>>         do this:
>>>>
>>>>         1. Remove these tags.
>>>>         2. Re-compile QEMU and re-install it with this patch
>>>>         http://www.spinics.net/lists/kvm/msg121742.html
>>>>         3. Add these tags between </devices> and </domain>
>>>>
>>>>         <qemu:commandline>
>>>>            <qemu:arg value='-cpu'/>
>>>>            <qemu:arg
>>>>         value='host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,kvm=off,hv)vendor_id='/>
>>>>         </qemu:commandline>
>>>>
>>>>         I'm using GTX980, too. Before that, I got poor 3D
>>>>         performance in Windows 10, after this patch and edition, I
>>>>         got performance back.
>>>>
>>>>         2015-11-02 1:43 GMT+08:00 Georgios Kourachanis
>>>>         <geo.kourachanis at gmail.com <mailto:geo.kourachanis at gmail.com>>:
>>>>
>>>>             Hello all,
>>>>
>>>>             I had been using Xen with some AMD GPUs for almost 2
>>>>             years till about June 2015, when I then found out that
>>>>             KVM and libvirt could do the same stuff I was
>>>>             interested in with nvidia GPUs, too. I needed the CUDA
>>>>             cores so I did change to an ASUS GTX 980 Strix. But
>>>>             unfortunatelly, I don't get any good performance output
>>>>             from it. On native windows 7/10 installation it's a
>>>>             beast though.
>>>>             I also have an AMD R7 250 which works great with KVM.
>>>>             But let's not mess with it.
>>>>
>>>>             Let me get to the point:
>>>>
>>>>             I have no problems as for the installation of Windows
>>>>             or OVMF or passing-through or anything else. The only
>>>>             problem is the GTX980's performance.
>>>>             The performance had a significant boost when I used the
>>>>             latest qemu branch with the hyper-v trick, but still,
>>>>             not getting what many people seem to claim in this
>>>>             mailing list "almost-native" (even with nvidia GPUs).
>>>>
>>>>
>>>>             Here is my system's specs:
>>>>
>>>>             Archlinux with 4.1.6-1-vfio (with the ACS patch ALONE)
>>>>             Intel Core i73770 ( I use the igpu for the archlinux)
>>>>             24GiB RAM
>>>>             ASUS GTX 980 Strix
>>>>             Sapphire R7 250
>>>>             ------------------------------------------------------------------------
>>>>             lspci (only pass-through'd stuff):
>>>>
>>>>             01:00.0 VGA compatible controller: NVIDIA Corporation
>>>>             GM204 [GeForce GTX 980] (rev a1)
>>>>             Subsystem: ASUSTeK Computer Inc. Device 8518
>>>>             Kernel driver in use: vfio-pci
>>>>             Kernel modules: nouveau
>>>>             01:00.1 Audio device: NVIDIA Corporation GM204 High
>>>>             Definition Audio Controller (rev a1)
>>>>             Subsystem: ASUSTeK Computer Inc. Device 8518
>>>>             Kernel driver in use: vfio-pci
>>>>             Kernel modules: snd_hda_intel
>>>>             02:00.0 VGA compatible controller: Advanced Micro
>>>>             Devices, Inc. [AMD/ATI] Oland PRO [Radeon R7 240/340]
>>>>             Subsystem: PC Partner Limited / Sapphire Technology
>>>>             Device e266
>>>>             Kernel modules: radeon
>>>>             02:00.1 Audio device: Advanced Micro Devices, Inc.
>>>>             [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD
>>>>             7700/7800 Series]
>>>>             Subsystem: PC Partner Limited / Sapphire Technology
>>>>             Device aab0
>>>>             Kernel driver in use: snd_hda_intel
>>>>             Kernel modules: snd_hda_intel
>>>>             08:00.0 USB controller: ASMedia Technology Inc. ASM1042
>>>>             SuperSpeed USB Host Controller
>>>>             Subsystem: ASRock Incorporation Motherboard
>>>>             Kernel driver in use: vfio-pci
>>>>             Kernel modules: xhci_pci
>>>>             ------------------------------------------------------------------------
>>>>             booting lines:
>>>>
>>>>             linux /boot/vmlinuz-linux-vfio root=UUID=XXXX rw
>>>>             intel_iommu=on pcie_acs_override=downstream
>>>>             isolcpus=2-3,6-7 nohz_full=2-3,6-7
>>>>             initrd /boot/intel-ucode.img /boot/initramfs-linux-vfio.img
>>>>             ------------------------------------------------------------------------
>>>>             /etc/fstab:|
>>>>
>>>>             hugetlbfs /hugepages hugetlbfs defaults 0 0|
>>>>             ------------------------------------------------------------------------
>>>>             /etc/sysctl.d/40-hugepage.conf:
>>>>
>>>>             vm.nr_hugepages = 8000
>>>>             ------------------------------------------------------------------------
>>>>             /etc/modules-load.d/vfio.conf:
>>>>
>>>>             kvm
>>>>             kvm-intel
>>>>             vfio
>>>>             vfio-pci
>>>>             vfio_iommu_type1
>>>>             vfio_virqfd
>>>>             ------------------------------------------------------------------------
>>>>             /etc/modprobe.d/kvm.conf:
>>>>
>>>>             options kvm ignore_msrs=1
>>>>             ------------------------------------------------------------------------
>>>>             /etc/modprobe.d/kvm-intel.conf:
>>>>
>>>>             options kvm-intel nested=1
>>>>             ------------------------------------------------------------------------
>>>>             /etc/modprobe.d/vfio_iommu_type1.conf:
>>>>
>>>>             options vfio_iommu_type1 allow_unsafe_interrupts=0
>>>>             ------------------------------------------------------------------------
>>>>             /etc/modprobe.d/vfio-pci.conf:
>>>>
>>>>             options vfio-pci
>>>>             ids=10de:13c0,10de:0fbb,1002:6613,1002:aab0,1b21:1042
>>>>             ------------------------------------------------------------------------
>>>>
>>>>             And the virsh xml:
>>>>
>>>>             <domain type='kvm'>
>>>>             <name>windows_10</name>
>>>>             <uuid>63045df8-c782-4cfd-abc7-a3598826ae83</uuid>
>>>>               <memory unit='KiB'>6553600</memory>
>>>>             <currentMemory unit='KiB'>6553600</currentMemory>
>>>>             <memoryBacking>
>>>>             <hugepages/>
>>>>             </memoryBacking>
>>>>               <vcpu placement='static'>4</vcpu>
>>>>             <cputune>
>>>>             <vcpupin vcpu='0' cpuset='2'/>
>>>>             <vcpupin vcpu='1' cpuset='3'/>
>>>>             <vcpupin vcpu='2' cpuset='6'/>
>>>>             <vcpupin vcpu='3' cpuset='7'/>
>>>>             </cputune>
>>>>               <os>
>>>>                 <type arch='x86_64' machine='pc-i440fx-2.4'>hvm</type>
>>>>             <loader readonly='yes'
>>>>             type='pflash'>/usr/local/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
>>>>             <nvram>/var/lib/libvirt/qemu/nvram/windows_nvidia_VARS.fd</nvram>
>>>>               </os>
>>>>             <features>
>>>>             <acpi/>
>>>>             <apic/>
>>>>             <pae/>
>>>>             <hyperv>
>>>>             <relaxed state='on'/>
>>>>             <vapic state='on'/>
>>>>             <spinlocks state='on' retries='8191'/>
>>>>             </hyperv>
>>>>             <kvm>
>>>>             <hidden state='on'/>
>>>>             </kvm>
>>>>             <vmport state='off'/>
>>>>             </features>
>>>>               <cpu mode='host-passthrough'>
>>>>             <topology sockets='1' cores='4' threads='1'/>
>>>>             </cpu>
>>>>               <clock offset='localtime'>
>>>>                 <timer name='rtc' tickpolicy='catchup'/>
>>>>                 <timer name='pit' tickpolicy='delay'/>
>>>>                 <timer name='hpet' present='no'/>
>>>>                 <timer name='hypervclock' present='yes'/>
>>>>             </clock>
>>>>             <on_poweroff>destroy</on_poweroff>
>>>>             <on_reboot>restart</on_reboot>
>>>>             <on_crash>restart</on_crash>
>>>>               <pm>
>>>>             <suspend-to-mem enabled='no'/>
>>>>             <suspend-to-disk enabled='no'/>
>>>>               </pm>
>>>>             <devices>
>>>>             <emulator>/usr/local/bin/qemu-system-x86_64.hv</emulator>
>>>>                 <disk type='block' device='disk'>
>>>>             <driver name='qemu' type='raw' cache='none'/>
>>>>             <source dev='/dev/mapper/vg_ssd-lv_kvm_NVIDIA'/>
>>>>             <target dev='sda' bus='scsi'/>
>>>>             <boot order='1'/>
>>>>             <address type='drive' controller='0' bus='0' target='0'
>>>>             unit='0'/>
>>>>             </disk>
>>>>                 <disk type='block' device='disk'>
>>>>             <driver name='qemu' type='raw' cache='none'/>
>>>>             <source dev='/dev/mapper/vg_raid5-lv_xen_ntfs_files'/>
>>>>             <target dev='sdb' bus='scsi'/>
>>>>             <address type='drive' controller='0' bus='0' target='0'
>>>>             unit='1'/>
>>>>             </disk>
>>>>             <controller type='usb' index='0'>
>>>>             <address type='pci' domain='0x0000' bus='0x00'
>>>>             slot='0x01' function='0x2'/>
>>>>             </controller>
>>>>             <controller type='pci' index='0' model='pci-root'/>
>>>>             <controller type='scsi' index='0' model='virtio-scsi'>
>>>>             <address type='pci' domain='0x0000' bus='0x00'
>>>>             slot='0x06' function='0x0'/>
>>>>             </controller>
>>>>             <interface type='bridge'>
>>>>                   <mac address='52:54:00:e9:85:8f'/>
>>>>             <source bridge='xenbr0'/>
>>>>             <model type='e1000'/>
>>>>             <address type='pci' domain='0x0000' bus='0x00'
>>>>             slot='0x03' function='0x0'/>
>>>>             </interface>
>>>>             <hostdev mode='subsystem' type='pci' managed='yes'>
>>>>             <source>
>>>>             <address domain='0x0000' bus='0x01' slot='0x00'
>>>>             function='0x0'/>
>>>>             </source>
>>>>             <address type='pci' domain='0x0000' bus='0x00'
>>>>             slot='0x0a' function='0x0' multifunction='on'/>
>>>>             </hostdev>
>>>>             <hostdev mode='subsystem' type='pci' managed='yes'>
>>>>             <source>
>>>>             <address domain='0x0000' bus='0x01' slot='0x00'
>>>>             function='0x1'/>
>>>>             </source>
>>>>             <address type='pci' domain='0x0000' bus='0x00'
>>>>             slot='0x0a' function='0x1'/>
>>>>             </hostdev>
>>>>             <hostdev mode='subsystem' type='pci' managed='yes'>
>>>>             <source>
>>>>             <address domain='0x0000' bus='0x08' slot='0x00'
>>>>             function='0x0'/>
>>>>             </source>
>>>>             <address type='pci' domain='0x0000' bus='0x00'
>>>>             slot='0x08' function='0x0'/>
>>>>             </hostdev>
>>>>             <memballoon model='virtio'>
>>>>             <address type='pci' domain='0x0000' bus='0x00'
>>>>             slot='0x05' function='0x0'/>
>>>>             </memballoon>
>>>>             </devices>
>>>>             </domain>
>>>>             ------------------------------------------------------------------------
>>>>
>>>>             /usr/local/bin/qemu-system-x86_64.hv:
>>>>             #!/bin/sh
>>>>             exec /usr/bin/qemu-system-x86_64 `echo "\$@" | \
>>>>             sed 's|hv_time|hv_time,hv_vendor_id=GoobyPLS|g'
>>>>
>>>>
>>>>
>>>>             And some notes:
>>>>
>>>>             1) Using "<topology sockets='1' cores='4'
>>>>             threads='1'/>" instead of "<topology sockets='1'
>>>>             cores='2' threads='2'/>" provided about 2% boost in GPU
>>>>             performance. No change in RAM or CPU tests. I've tested
>>>>             with the passmark.
>>>>
>>>>             2) I tried using the emulatorpin method Alex says on a
>>>>             mail here on vfio-users, but I didn't notice any
>>>>             changed in GPU performance. I didn't test it on the CPU
>>>>             side though.
>>>>
>>>>             3) The main problem of the performance lack is that a
>>>>             specific game that I've been playing isn't quite
>>>>             playable. That game has been mentioned before here on
>>>>             the list, it's Tera (european version (gameforge),
>>>>             although american version(enmasse) has exactly the same
>>>>             performance).
>>>>
>>>>             4) Every other game I managed to play is quite
>>>>             playable, though I haven't tested them to see if they
>>>>             run on native speeds.
>>>>
>>>>
>>>>             I'd really want some help on this matter, I really want
>>>>             to make my server run this VM with the nvidia GPU. I
>>>>             hate dual booting Windows >_>
>>>>
>>>>
>>>>             Thanks!
>>>>
>>>>             _______________________________________________
>>>>             vfio-users mailing list
>>>>             vfio-users at redhat.com <mailto:vfio-users at redhat.com>
>>>>             https://www.redhat.com/mailman/listinfo/vfio-users
>>>>
>>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> vfio-users mailing list
>> vfio-users at redhat.com <mailto:vfio-users at redhat.com>
>> https://www.redhat.com/mailman/listinfo/vfio-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20151103/e0b79a2b/attachment.htm>


More information about the vfio-users mailing list