[vfio-users] Brutal DPC Latency - how is yours? check it please and report back

Mon Feb 29 13:10:46 UTC 2016

I made the process threw scripts just because cset doesn't work on debian :)

rcu_nocbs should moving much of kernel threads out of the selected cpus.

A good example to know that your vcpus are not used is, for intel, to use
turbostat. It show the activity of threads, and the average / top frequency.
Create your shield, run a hard program. If your selected cpus have 0% usage
/ 0Mhz average frequency, i think you are good to go :)

For my part, during a compilation of kernel with my shield, threads of core
1,2,3 are at 0 ~ 7 Mhz average.I don't know if the pic of 7Mhz is residual
or real...

--
Deldycke Quentin

On 29 February 2016 at 11:56, Rokas Kupstys <rokups at zoho.com> wrote:

> I tried nohz_full/rcu_nocbs in the past but it did not show any visible
> improvement. I will try again i guess. I however had similar setup with
> cset, although done rather manually.
>
> cset set -c 0-1 system
> cset proc -m -f root -t system -k
>
> This essentially moves all tasks to cores 0 and 1. Since libvirt uses cpu
> pinning it stays on cores assigned in xml. However it did not show much of
> an improvement either. One thing that bothers me is kernel threads that
> cant be moved from cores i would like to dedicate to VM. Any idea if there
> is some kernel parameter which would prevent kernel thread creation on
> certain cores but allow userspace to run on those same cores? This would be
> perfect substitute for isolcpus param. We could clear cores from any tasks
> when its needed and use them when vm is offline. Cant do that with isolcpus.
>
> Ill analyze that repo, seems interesting, thanks for the link.
>
>
> On 2016.02.29 12:16, Quentin Deldycke wrote:
>
> Near as efficient as isolcpus, but can be used dynamically, during run:
>
> Use nohz_full / rcu_nocbs, to offload all rcu of your vm core to your
> OS-only cores
> Use cgroups, when you start vm, you keep only x core to the OS, when you
> shut it down, let the OS have all cores.
>
> If vm is started and you need to have a power boost on linux, just use
> "echo $$ | sudo tee /cgroups/cgroup.procs", and you will have all cores for
> program run from this shell :)
>
> Linux only: all core, (but cores 1,2,3 are in nohz mode, offloaded by core
> 0)
> Linux + windows: 1 core to linux, 3 core to windows
> Need boost on linux: the little command line for this shell
>
>
> Example of cgroup usage:
> https://github.com/qdel/scripts/tree/master/vfio/scripts => shieldbuild /
> shieldbreak
>
> Which are called threw qemu hooks:
> https://github.com/qdel/scripts/tree/master/vfio/hooks
>
> I do not configure my io, i let qemu manage.
>
>
> Not one fun behavior:
> While idle, i am completely still at ~1000us,
> If i run a game, it goes down to a completely still 500us
>
> Example: http://b.qdel.fr/test.png
>
> Sorry for quality, vnc to 4k screen from 1080p all this...
>
>
> --
> Deldycke Quentin
>
>
> On 29 February 2016 at 10:55, Rokas Kupstys <rokups at zoho.com> wrote:
>
>> Yes currently i am actually booted with vanilla archlinux kernel, no
>> NO_HZ and other stuff.
>>
>> Why does 2 core for the host is unacceptable? You plan to use it making
>> hard workloads while gaming?
>>
>> Problem with isolcpus is that it exempts cores from linux cpu scheduler.
>> This means even if VM is offline they will stand idle. While i dont do
>> anything on host while gaming i do plenty when not gaming and just throwing
>> away 6 cores of already disadvantaged AMD cpu is a real waste.
>>
>> This config is not good actually.
>>
>> Well.. It indeed looks bad on paper, however it is the only one that
>> yields bearable DPC latency. I tried what you mentioned, various
>> combinations. Pinning 0,2,4,6 cores to vm, 1,3 to emulator, 5,7 for io /
>> 1,3,5,7 cores to vm, 0,2 to emulator, 4,6 for io / 0,1,2,3 cores to vm, 4,5
>> to emulator, 6,7 for io / 4,5,6,7 cores to vm, 0,1 to emulator, 2,3 for io.
>> All of them yield terrible latency.
>>
>> Would be interesting to hear someone who has AMD build, how (if) he
>> solved this.
>>
>>
>> On 2016.02.29 11:10, Bronek Kozicki wrote:
>>
>> Two things you can improve, IMO
>>
>> * disable NO_HZ
>>
>> * use isolcpus to dedicate your pinned CPUs to guest only - this
>> will also ensure they are not used for guest  IO.
>>
>> B.
>>
>> On 29/02/2016 08:45, Rokas Kupstys wrote:
>>
>>
>>
>>
>> Yesterday i figured out my latency problem. All things listed
>> everywhere on internet failed. Last thing i tried was pinning one
>> vcpu to two physical cores and it brought latency down. Now i have
>> FX-8350 CPU which has shared FPU for each two cores so maybe thats
>> why. With just this pinning latency now is most of the time just
>> above 1000μs. However under load latency increases. I threw out
>> iothreads and emulator pinning and it did not affect much.
>> Superior latency could be achieved using isolcpus=2-7, however
>> leaving just two cores to host is unacceptable. With that setting
>> latency was around 500μs without load. Good part is that
>> Battlefield3 no longer lags, although i observed increased loading
>> times on textures compared to bare metal. Not so good part is that
>> there still is minor sound skipping/cracking since latency is
>> spiking up under load. That is very disappointing. I also tried
>> performance with two VM cores pinned to 4 host cores - bf3 lagged
>> enough to be unplayable. 3 vm cores pinned to 6 host cores was
>> already playable but sound was still cracking. I noticed little
>> difference between that and 4 vm cores pinned to 8 host cores. Be
>> nice if sound could be cleaned up. If anyone have any ideas im all
>> ears. Libvirt xml i use now:
>>
>>
>>
>>   <vcpu
>> placement='static'>4</vcpu>
>>
>>   <cputune>
>>
>>     <vcpupin vcpu='0' cpuset='0-1'/>
>>
>>     <vcpupin vcpu='1' cpuset='2-3'/>
>>
>>     <vcpupin vcpu='2' cpuset='4-5'/>
>>
>>     <vcpupin vcpu='3' cpuset='6-7'/>
>>
>>   </cputune>
>>
>>   <features>
>>
>>     <acpi/>
>>
>>     <apic/>
>>
>>     <pae/>
>>
>>     <hap/>
>>
>>     <viridian/>
>>
>>     <hyperv>
>>
>>       <relaxed state='on'/>
>>
>>       <vapic state='on'/>
>>
>>       <spinlocks state='on' retries='8191'/>
>>
>>     </hyperv>
>>
>>     <kvm>
>>
>>       <hidden state='on'/>
>>
>>     </kvm>
>>
>>     <pvspinlock state='on'/>
>>
>>   </features>
>>
>>   <cpu mode='host-passthrough'>
>>
>>     <topology sockets='1' cores='4' threads='1'/>
>>
>>   </cpu>
>>
>>   <clock offset='utc'>
>>
>>     <timer name='rtc' tickpolicy='catchup'/>
>>
>>     <timer name='pit' tickpolicy='delay'/>
>>
>>     <timer name='hpet' present='no'/>
>>
>>     <timer name='hypervclock' present='yes'/>
>>
>>   </clock>
>>
>>
>>
>>
>> Kernel configs
>>
>> CONFIG_NO_HZ_FULL=y
>>
>> CONFIG_RCU_NOCB_CPU_ALL=y
>>
>> CONFIG_HZ_1000=y
>>
>> CONFIG_HZ=1000
>>
>>
>> I am not convinced 1000 hz tickrate is needed. Default one (300)
>> seems to perform equally as well from looking at latency charts.
>> Did not get chance to test it with bf3 yet however.
>>
>>
>>
>>
>>
>> On 2016.01.12 11:12, thibaut noah
>> wrote:
>>
>>
>>
>>
>>
>>
>>
>> [cut]
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> vfio-users mailing list
>> vfio-users at redhat.com
>> https://www.redhat.com/mailman/listinfo/vfio-users
>>
>>
>
>
> _______________________________________________
> vfio-users mailing listvfio-users at redhat.comhttps://www.redhat.com/mailman/listinfo/vfio-users
>
>
>
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20160229/80eae773/attachment.htm>