[vfio-users] Brutal DPC Latency - how is yours? check it please and report back

Rokas Kupstys rokups at zoho.com
Mon Feb 29 10:56:44 UTC 2016


I tried nohz_full/rcu_nocbs in the past but it did not show any visible
improvement. I will try again i guess. I however had similar setup with
cset, although done rather manually.

> cset set -c 0-1 system
> cset proc -m -f root -t system -k
This essentially moves all tasks to cores 0 and 1. Since libvirt uses
cpu pinning it stays on cores assigned in xml. However it did not show
much of an improvement either. One thing that bothers me is kernel
threads that cant be moved from cores i would like to dedicate to VM.
Any idea if there is some kernel parameter which would prevent kernel
thread creation on certain cores but allow userspace to run on those
same cores? This would be perfect substitute for isolcpus param. We
could clear cores from any tasks when its needed and use them when vm is
offline. Cant do that with isolcpus.

Ill analyze that repo, seems interesting, thanks for the link.

On 2016.02.29 12:16, Quentin Deldycke wrote:
> Near as efficient as isolcpus, but can be used dynamically, during run:
>
> Use nohz_full / rcu_nocbs, to offload all rcu of your vm core to your
> OS-only cores
> Use cgroups, when you start vm, you keep only x core to the OS, when
> you shut it down, let the OS have all cores.
>
> If vm is started and you need to have a power boost on linux, just use
> "echo $$ | sudo tee /cgroups/cgroup.procs", and you will have all
> cores for program run from this shell :)
>
> Linux only: all core, (but cores 1,2,3 are in nohz mode, offloaded by
> core 0)
> Linux + windows: 1 core to linux, 3 core to windows
> Need boost on linux: the little command line for this shell
>
>
> Example of cgroup usage:
> https://github.com/qdel/scripts/tree/master/vfio/scripts =>
> shieldbuild / shieldbreak
>
> Which are called threw qemu hooks:
> https://github.com/qdel/scripts/tree/master/vfio/hooks
>
> I do not configure my io, i let qemu manage.
>
>
> Not one fun behavior:
> While idle, i am completely still at ~1000us,
> If i run a game, it goes down to a completely still 500us
>
> Example: http://b.qdel.fr/test.png
>
> Sorry for quality, vnc to 4k screen from 1080p all this...
>
>
> -- 
> Deldycke Quentin
>
>
> On 29 February 2016 at 10:55, Rokas Kupstys <rokups at zoho.com
> <mailto:rokups at zoho.com>> wrote:
>
>     Yes currently i am actually booted with vanilla archlinux kernel,
>     no NO_HZ and other stuff.
>>     Why does 2 core for the host is unacceptable? You plan to use it
>>     making hard workloads while gaming?
>     Problem with isolcpus is that it exempts cores from linux cpu
>     scheduler. This means even if VM is offline they will stand idle.
>     While i dont do anything on host while gaming i do plenty when not
>     gaming and just throwing away 6 cores of already disadvantaged AMD
>     cpu is a real waste.
>
>>     This config is not good actually.
>     Well.. It indeed looks bad on paper, however it is the only one
>     that yields bearable DPC latency. I tried what you mentioned,
>     various combinations. Pinning 0,2,4,6 cores to vm, 1,3 to
>     emulator, 5,7 for io / 1,3,5,7 cores to vm, 0,2 to emulator, 4,6
>     for io / 0,1,2,3 cores to vm, 4,5 to emulator, 6,7 for io /
>     4,5,6,7 cores to vm, 0,1 to emulator, 2,3 for io. All of them
>     yield terrible latency.
>
>     Would be interesting to hear someone who has AMD build, how (if)
>     he solved this.
>
>
>     On 2016.02.29 11:10, Bronek Kozicki wrote:
>>     Two things you can improve, IMO
>>
>>     * disable NO_HZ
>>
>>     * use isolcpus to dedicate your pinned CPUs to guest only - this
>>     will also ensure they are not used for guest  IO.
>>
>>     B.
>>
>>     On 29/02/2016 08:45, Rokas Kupstys wrote:
>>
>>
>>>
>>>
>>>     Yesterday i figured out my latency problem. All things listed
>>>     everywhere on internet failed. Last thing i tried was pinning one
>>>     vcpu to two physical cores and it brought latency down. Now i have
>>>     FX-8350 CPU which has shared FPU for each two cores so maybe thats
>>>     why. With just this pinning latency now is most of the time just
>>>     above 1000μs. However under load latency increases. I threw out
>>>     iothreads and emulator pinning and it did not affect much.
>>>     Superior latency could be achieved using isolcpus=2-7, however
>>>     leaving just two cores to host is unacceptable. With that setting
>>>     latency was around 500μs without load. Good part is that
>>>     Battlefield3 no longer lags, although i observed increased loading
>>>     times on textures compared to bare metal. Not so good part is that
>>>     there still is minor sound skipping/cracking since latency is
>>>     spiking up under load. That is very disappointing. I also tried
>>>     performance with two VM cores pinned to 4 host cores - bf3 lagged
>>>     enough to be unplayable. 3 vm cores pinned to 6 host cores was
>>>     already playable but sound was still cracking. I noticed little
>>>     difference between that and 4 vm cores pinned to 8 host cores. Be
>>>     nice if sound could be cleaned up. If anyone have any ideas im all
>>>     ears. Libvirt xml i use now:
>>>
>>>
>>>
>>>>       <vcpu
>>>>     placement='static'>4</vcpu>
>>>>
>>>>       <cputune>
>>>>
>>>>         <vcpupin vcpu='0' cpuset='0-1'/>
>>>>
>>>>         <vcpupin vcpu='1' cpuset='2-3'/>
>>>>
>>>>         <vcpupin vcpu='2' cpuset='4-5'/>
>>>>
>>>>         <vcpupin vcpu='3' cpuset='6-7'/>
>>>>
>>>>       </cputune>
>>>>
>>>>       <features>
>>>>
>>>>         <acpi/>
>>>>
>>>>         <apic/>
>>>>
>>>>         <pae/>
>>>>
>>>>         <hap/>
>>>>
>>>>         <viridian/>
>>>>
>>>>         <hyperv>
>>>>
>>>>           <relaxed state='on'/>
>>>>
>>>>           <vapic state='on'/>
>>>>
>>>>           <spinlocks state='on' retries='8191'/>
>>>>
>>>>         </hyperv>
>>>>
>>>>         <kvm>
>>>>
>>>>           <hidden state='on'/>
>>>>
>>>>         </kvm>
>>>>
>>>>         <pvspinlock state='on'/>
>>>>
>>>>       </features>
>>>>
>>>>       <cpu mode='host-passthrough'>
>>>>
>>>>         <topology sockets='1' cores='4' threads='1'/>
>>>>
>>>>       </cpu>
>>>>
>>>>       <clock offset='utc'>
>>>>
>>>>         <timer name='rtc' tickpolicy='catchup'/>
>>>>
>>>>         <timer name='pit' tickpolicy='delay'/>
>>>>
>>>>         <timer name='hpet' present='no'/>
>>>>
>>>>         <timer name='hypervclock' present='yes'/>
>>>>
>>>>       </clock>
>>>>
>>>>
>>>>
>>>
>>>     Kernel configs
>>>
>>>>     CONFIG_NO_HZ_FULL=y
>>>>
>>>>     CONFIG_RCU_NOCB_CPU_ALL=y
>>>>
>>>>     CONFIG_HZ_1000=y
>>>>
>>>>     CONFIG_HZ=1000
>>>
>>>     I am not convinced 1000 hz tickrate is needed. Default one (300)
>>>     seems to perform equally as well from looking at latency charts.
>>>     Did not get chance to test it with bf3 yet however.
>>>
>>>
>>>
>>>
>>>
>>>     On 2016.01.12 11:12, thibaut noah
>>>     wrote:
>>>
>>>
>>>
>>>
>>
>>
>>
>>     [cut]
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>     _______________________________________________
>     vfio-users mailing list
>     vfio-users at redhat.com <mailto:vfio-users at redhat.com>
>     https://www.redhat.com/mailman/listinfo/vfio-users
>
>
>
>
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20160229/4071388e/attachment.htm>


More information about the vfio-users mailing list