<div dir="ltr"><div><div><div><div>I made the process threw scripts just because cset doesn't work on debian :)<br><br></div>rcu_nocbs should moving much of kernel threads out of the selected cpus.<br><br></div>A good example to know that your vcpus are not used is, for intel, to use turbostat. It show the activity of threads, and the average / top frequency.<br></div>Create your shield, run a hard program. If your selected cpus have 0% usage / 0Mhz average frequency, i think you are good to go :)<br><br></div>For my part, during a compilation of kernel with my shield, threads of core 1,2,3 are at 0 ~ 7 Mhz average.I don't know if the pic of 7Mhz is residual or real...<br><div><br><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr">--<div>Deldycke Quentin<br></div><div><div><br></div></div></div></div></div> <br><div class="gmail_quote">On 29 February 2016 at 11:56, Rokas Kupstys <span dir="ltr"><<a href="mailto:rokups@zoho.com" target="_blank">rokups@zoho.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div bgcolor="#FFFFFF" text="#000000"> I tried nohz_full/rcu_nocbs in the past but it did not show any visible improvement. I will try again i guess. I however had similar setup with cset, although done rather manually.<br> <br> <blockquote type="cite">cset set -c 0-1 system<span class=""><br> cset proc -m -f root -t system -k<br> </span></blockquote> This essentially moves all tasks to cores 0 and 1. Since libvirt uses cpu pinning it stays on cores assigned in xml. However it did not show much of an improvement either. One thing that bothers me is kernel threads that cant be moved from cores i would like to dedicate to VM. Any idea if there is some kernel parameter which would prevent kernel thread creation on certain cores but allow userspace to run on those same cores? This would be perfect substitute for isolcpus param. We could clear cores from any tasks when its needed and use them when vm is offline. Cant do that with isolcpus.<br> <br> Ill analyze that repo, seems interesting, thanks for the link.<div><div class="h5"><br> <br> <div>On 2016.02.29 12:16, Quentin Deldycke wrote:<br> </div> <blockquote type="cite"> <div dir="ltr"> <div> <div> <div> <div> <div> <div> <div> <div>Near as efficient as isolcpus, but can be used dynamically, during run:<br> <br> </div> Use nohz_full / rcu_nocbs, to offload all rcu of your vm core to your OS-only cores<br> </div> Use cgroups, when you start vm, you keep only x core to the OS, when you shut it down, let the OS have all cores.<br> <br> </div> If vm is started and you need to have a power boost on linux, just use "echo $$ | sudo tee /cgroups/cgroup.procs", and you will have all cores for program run from this shell :)<br> <br> </div> Linux only: all core, (but cores 1,2,3 are in nohz mode, offloaded by core 0)<br> </div> Linux + windows: 1 core to linux, 3 core to windows<br> </div> Need boost on linux: the little command line for this shell<br> <br> <br> </div> Example of cgroup usage:<br> <a href="https://github.com/qdel/scripts/tree/master/vfio/scripts" target="_blank">https://github.com/qdel/scripts/tree/master/vfio/scripts</a> => shieldbuild / shieldbreak<br> <br> </div> Which are called threw qemu hooks:<br> <a href="https://github.com/qdel/scripts/tree/master/vfio/hooks" target="_blank">https://github.com/qdel/scripts/tree/master/vfio/hooks</a><br> <div><br> </div> <div>I do not configure my io, i let qemu manage.<br> <br> <br> </div> <div>Not one fun behavior:<br> </div> <div>While idle, i am completely still at ~1000us,<br> </div> <div>If i run a game, it goes down to a completely still 500us<br> <br> </div> <div>Example: <a href="http://b.qdel.fr/test.png" target="_blank">http://b.qdel.fr/test.png</a><br> </div> <div><br> </div> <div>Sorry for quality, vnc to 4k screen from 1080p all this...<br> </div> <div><br> </div> </div> <div class="gmail_extra"><br clear="all"> <div> <div> <div dir="ltr">-- <div>Deldycke Quentin<br> </div> <div> <div><br> </div> </div> </div> </div> </div> <br> <div class="gmail_quote">On 29 February 2016 at 10:55, Rokas Kupstys <span dir="ltr"><<a href="mailto:rokups@zoho.com" target="_blank">rokups@zoho.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <div bgcolor="#FFFFFF" text="#000000"> Yes currently i am actually booted with vanilla archlinux kernel, no NO_HZ and other stuff.<span><br> <blockquote type="cite">Why does 2 core for the host is unacceptable? You plan to use it making hard workloads while gaming?</blockquote> </span> Problem with isolcpus is that it exempts cores from linux cpu scheduler. This means even if VM is offline they will stand idle. While i dont do anything on host while gaming i do plenty when not gaming and just throwing away 6 cores of already disadvantaged AMD cpu is a real waste.<span><br> <br> <blockquote type="cite">This config is not good actually.</blockquote> </span> Well.. It indeed looks bad on paper, however it is the only one that yields bearable DPC latency. I tried what you mentioned, various combinations. Pinning 0,2,4,6 cores to vm, 1,3 to emulator, 5,7 for io / 1,3,5,7 cores to vm, 0,2 to emulator, 4,6 for io / 0,1,2,3 cores to vm, 4,5 to emulator, 6,7 for io / 4,5,6,7 cores to vm, 0,1 to emulator, 2,3 for io. All of them yield terrible latency.<br> <br> Would be interesting to hear someone who has AMD build, how (if) he solved this. <div> <div><br> <br> <div>On 2016.02.29 11:10, Bronek Kozicki wrote:<br> </div> <blockquote type="cite">Two things you can improve, IMO<br> <div><br> * disable NO_HZ<br> <br> * use isolcpus to dedicate your pinned CPUs to guest only - this<br> will also ensure they are not used for guest IO.<br> <br> B.<br> <br> On 29/02/2016 08:45, Rokas Kupstys wrote:<br> <br> </div> <br> <blockquote type="cite"><br> <br> Yesterday i figured out my latency problem. All things listed<br> everywhere on internet failed. Last thing i tried was pinning one<br> vcpu to two physical cores and it brought latency down. Now i have<br> FX-8350 CPU which has shared FPU for each two cores so maybe thats<br> why. With just this pinning latency now is most of the time just<br> above 1000μs. However under load latency increases. I threw out<br> iothreads and emulator pinning and it did not affect much.<br> Superior latency could be achieved using isolcpus=2-7, however<br> leaving just two cores to host is unacceptable. With that setting<br> latency was around 500μs without load. Good part is that<br> Battlefield3 no longer lags, although i observed increased loading<br> times on textures compared to bare metal. Not so good part is that<br> there still is minor sound skipping/cracking since latency is<br> spiking up under load. That is very disappointing. I also tried<br> performance with two VM cores pinned to 4 host cores - bf3 lagged<br> enough to be unplayable. 3 vm cores pinned to 6 host cores was<br> already playable but sound was still cracking. I noticed little<br> difference between that and 4 vm cores pinned to 8 host cores. Be<br> nice if sound could be cleaned up. If anyone have any ideas im all<br> ears. Libvirt xml i use now:<br> <br> <br> <br> <blockquote type="cite"> <vcpu<br> placement='static'>4</vcpu><br> <br> <cputune><br> <br> <vcpupin vcpu='0' cpuset='0-1'/><br> <br> <vcpupin vcpu='1' cpuset='2-3'/><br> <br> <vcpupin vcpu='2' cpuset='4-5'/><br> <br> <vcpupin vcpu='3' cpuset='6-7'/><br> <br> </cputune><br> <br> <features><br> <br> <acpi/><br> <br> <apic/><br> <br> <pae/><br> <br> <hap/><br> <br> <viridian/><br> <br> <hyperv><br> <br> <relaxed state='on'/><br> <br> <vapic state='on'/><br> <br> <spinlocks state='on' retries='8191'/><br> <br> </hyperv><br> <br> <kvm><br> <br> <hidden state='on'/><br> <br> </kvm><br> <br> <pvspinlock state='on'/><br> <br> </features><br> <br> <cpu mode='host-passthrough'><br> <br> <topology sockets='1' cores='4' threads='1'/><br> <br> </cpu><br> <br> <clock offset='utc'><br> <br> <timer name='rtc' tickpolicy='catchup'/><br> <br> <timer name='pit' tickpolicy='delay'/><br> <br> <timer name='hpet' present='no'/><br> <br> <timer name='hypervclock' present='yes'/><br> <br> </clock><br> <br> <br> <br> </blockquote> <br> Kernel configs<br> <br> <blockquote type="cite"> CONFIG_NO_HZ_FULL=y<br> <br> CONFIG_RCU_NOCB_CPU_ALL=y<br> <br> CONFIG_HZ_1000=y<br> <br> CONFIG_HZ=1000</blockquote> <br> I am not convinced 1000 hz tickrate is needed. Default one (300)<br> seems to perform equally as well from looking at latency charts.<br> Did not get chance to test it with bf3 yet however.<br> <br> <br> <br> <br> <br> <div>On 2016.01.12 11:12, thibaut noah<br> wrote:<br> <br> </div> <br> <br> <br> </blockquote> <br> <br> <br> [cut]<br> <br> <br> <br> <br> <br> <br> <br> <br> </blockquote> <br> <br> </div> </div> </div> <br> _______________________________________________<br> vfio-users mailing list<br> <a href="mailto:vfio-users@redhat.com" target="_blank">vfio-users@redhat.com</a><br> <a href="https://www.redhat.com/mailman/listinfo/vfio-users" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/vfio-users</a><br> <br> </blockquote> </div> <br> </div> <br> <fieldset></fieldset> <br> <pre>_______________________________________________ vfio-users mailing list <a href="mailto:vfio-users@redhat.com" target="_blank">vfio-users@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/vfio-users" target="_blank">https://www.redhat.com/mailman/listinfo/vfio-users</a> </pre> </blockquote> <br> </div></div></div> <br>_______________________________________________<br> vfio-users mailing list<br> <a href="mailto:vfio-users@redhat.com">vfio-users@redhat.com</a><br> <a href="https://www.redhat.com/mailman/listinfo/vfio-users" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/vfio-users</a><br> <br></blockquote></div><br></div>