<div dir="ltr"><div><div><div><div>I made the process threw scripts just because cset doesn't work on debian :)<br><br></div>rcu_nocbs should moving much of kernel threads out of the selected cpus.<br><br></div>A good example to know that your vcpus are not used is, for intel, to use turbostat. It show the activity of threads, and the average / top frequency.<br></div>Create your shield, run a hard program. If your selected cpus have 0% usage / 0Mhz average frequency, i think you are good to go :)<br><br></div>For my part, during a compilation of kernel with my shield, threads of core 1,2,3 are at 0 ~ 7 Mhz average.I don't know if the pic of 7Mhz is residual or real...<br><div><br><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr">--<div>Deldycke Quentin<br></div><div><div><br></div></div></div></div></div>
<br><div class="gmail_quote">On 29 February 2016 at 11:56, Rokas Kupstys <span dir="ltr"><<a href="mailto:rokups@zoho.com" target="_blank">rokups@zoho.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
I tried nohz_full/rcu_nocbs in the past but it did not show any
visible improvement. I will try again i guess. I however had similar
setup with cset, although done rather manually.<br>
<br>
<blockquote type="cite">cset set -c 0-1 system<span class=""><br>
cset proc -m -f root -t system -k<br>
</span></blockquote>
This essentially moves all tasks to cores 0 and 1. Since libvirt
uses cpu pinning it stays on cores assigned in xml. However it did
not show much of an improvement either. One thing that bothers me is
kernel threads that cant be moved from cores i would like to
dedicate to VM. Any idea if there is some kernel parameter which
would prevent kernel thread creation on certain cores but allow
userspace to run on those same cores? This would be perfect
substitute for isolcpus param. We could clear cores from any tasks
when its needed and use them when vm is offline. Cant do that with
isolcpus.<br>
<br>
Ill analyze that repo, seems interesting, thanks for the link.<div><div class="h5"><br>
<br>
<div>On 2016.02.29 12:16, Quentin Deldycke
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>Near as efficient as isolcpus, but can be
used dynamically, during run:<br>
<br>
</div>
Use nohz_full / rcu_nocbs, to offload all rcu of
your vm core to your OS-only cores<br>
</div>
Use cgroups, when you start vm, you keep only x core
to the OS, when you shut it down, let the OS have
all cores.<br>
<br>
</div>
If vm is started and you need to have a power boost on
linux, just use "echo $$ | sudo tee
/cgroups/cgroup.procs", and you will have all cores
for program run from this shell :)<br>
<br>
</div>
Linux only: all core, (but cores 1,2,3 are in nohz mode,
offloaded by core 0)<br>
</div>
Linux + windows: 1 core to linux, 3 core to windows<br>
</div>
Need boost on linux: the little command line for this shell<br>
<br>
<br>
</div>
Example of cgroup usage:<br>
<a href="https://github.com/qdel/scripts/tree/master/vfio/scripts" target="_blank">https://github.com/qdel/scripts/tree/master/vfio/scripts</a>
=> shieldbuild / shieldbreak<br>
<br>
</div>
Which are called threw qemu hooks:<br>
<a href="https://github.com/qdel/scripts/tree/master/vfio/hooks" target="_blank">https://github.com/qdel/scripts/tree/master/vfio/hooks</a><br>
<div><br>
</div>
<div>I do not configure my io, i let qemu manage.<br>
<br>
<br>
</div>
<div>Not one fun behavior:<br>
</div>
<div>While idle, i am completely still at ~1000us,<br>
</div>
<div>If i run a game, it goes down to a completely still 500us<br>
<br>
</div>
<div>Example: <a href="http://b.qdel.fr/test.png" target="_blank">http://b.qdel.fr/test.png</a><br>
</div>
<div><br>
</div>
<div>Sorry for quality, vnc to 4k screen from 1080p all this...<br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br clear="all">
<div>
<div>
<div dir="ltr">--
<div>Deldycke Quentin<br>
</div>
<div>
<div><br>
</div>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">On 29 February 2016 at 10:55, Rokas
Kupstys <span dir="ltr"><<a href="mailto:rokups@zoho.com" target="_blank">rokups@zoho.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Yes currently i am
actually booted with vanilla archlinux kernel, no NO_HZ
and other stuff.<span><br>
<blockquote type="cite">Why does 2 core for the host is
unacceptable? You plan to use it making hard workloads
while gaming?</blockquote>
</span> Problem with isolcpus is that it exempts cores
from linux cpu scheduler. This means even if VM is offline
they will stand idle. While i dont do anything on host
while gaming i do plenty when not gaming and just throwing
away 6 cores of already disadvantaged AMD cpu is a real
waste.<span><br>
<br>
<blockquote type="cite">This config is not good
actually.</blockquote>
</span> Well.. It indeed looks bad on paper, however it is
the only one that yields bearable DPC latency. I tried
what you mentioned, various combinations. Pinning 0,2,4,6
cores to vm, 1,3 to emulator, 5,7 for io / 1,3,5,7 cores
to vm, 0,2 to emulator, 4,6 for io / 0,1,2,3 cores to vm,
4,5 to emulator, 6,7 for io / 4,5,6,7 cores to vm, 0,1 to
emulator, 2,3 for io. All of them yield terrible latency.<br>
<br>
Would be interesting to hear someone who has AMD build,
how (if) he solved this.
<div>
<div><br>
<br>
<div>On 2016.02.29 11:10, Bronek Kozicki wrote:<br>
</div>
<blockquote type="cite">Two things you can improve,
IMO<br>
<div><br>
* disable NO_HZ<br>
<br>
* use isolcpus to dedicate your pinned CPUs to
guest only - this<br>
will also ensure they are not used for guest IO.<br>
<br>
B.<br>
<br>
On 29/02/2016 08:45, Rokas Kupstys wrote:<br>
<br>
</div>
<br>
<blockquote type="cite"><br>
<br>
Yesterday i figured out my latency problem. All
things listed<br>
everywhere on internet failed. Last thing i tried
was pinning one<br>
vcpu to two physical cores and it brought latency
down. Now i have<br>
FX-8350 CPU which has shared FPU for each two
cores so maybe thats<br>
why. With just this pinning latency now is most of
the time just<br>
above 1000μs. However under load latency
increases. I threw out<br>
iothreads and emulator pinning and it did not
affect much.<br>
Superior latency could be achieved using
isolcpus=2-7, however<br>
leaving just two cores to host is unacceptable.
With that setting<br>
latency was around 500μs without load. Good part
is that<br>
Battlefield3 no longer lags, although i observed
increased loading<br>
times on textures compared to bare metal. Not so
good part is that<br>
there still is minor sound skipping/cracking since
latency is<br>
spiking up under load. That is very disappointing.
I also tried<br>
performance with two VM cores pinned to 4 host
cores - bf3 lagged<br>
enough to be unplayable. 3 vm cores pinned to 6
host cores was<br>
already playable but sound was still cracking. I
noticed little<br>
difference between that and 4 vm cores pinned to 8
host cores. Be<br>
nice if sound could be cleaned up. If anyone have
any ideas im all<br>
ears. Libvirt xml i use now:<br>
<br>
<br>
<br>
<blockquote type="cite"> <vcpu<br>
placement='static'>4</vcpu><br>
<br>
<cputune><br>
<br>
<vcpupin vcpu='0' cpuset='0-1'/><br>
<br>
<vcpupin vcpu='1' cpuset='2-3'/><br>
<br>
<vcpupin vcpu='2' cpuset='4-5'/><br>
<br>
<vcpupin vcpu='3' cpuset='6-7'/><br>
<br>
</cputune><br>
<br>
<features><br>
<br>
<acpi/><br>
<br>
<apic/><br>
<br>
<pae/><br>
<br>
<hap/><br>
<br>
<viridian/><br>
<br>
<hyperv><br>
<br>
<relaxed state='on'/><br>
<br>
<vapic state='on'/><br>
<br>
<spinlocks state='on'
retries='8191'/><br>
<br>
</hyperv><br>
<br>
<kvm><br>
<br>
<hidden state='on'/><br>
<br>
</kvm><br>
<br>
<pvspinlock state='on'/><br>
<br>
</features><br>
<br>
<cpu mode='host-passthrough'><br>
<br>
<topology sockets='1' cores='4'
threads='1'/><br>
<br>
</cpu><br>
<br>
<clock offset='utc'><br>
<br>
<timer name='rtc'
tickpolicy='catchup'/><br>
<br>
<timer name='pit' tickpolicy='delay'/><br>
<br>
<timer name='hpet' present='no'/><br>
<br>
<timer name='hypervclock'
present='yes'/><br>
<br>
</clock><br>
<br>
<br>
<br>
</blockquote>
<br>
Kernel configs<br>
<br>
<blockquote type="cite"> CONFIG_NO_HZ_FULL=y<br>
<br>
CONFIG_RCU_NOCB_CPU_ALL=y<br>
<br>
CONFIG_HZ_1000=y<br>
<br>
CONFIG_HZ=1000</blockquote>
<br>
I am not convinced 1000 hz tickrate is needed.
Default one (300)<br>
seems to perform equally as well from looking at
latency charts.<br>
Did not get chance to test it with bf3 yet
however.<br>
<br>
<br>
<br>
<br>
<br>
<div>On 2016.01.12 11:12, thibaut noah<br>
wrote:<br>
<br>
</div>
<br>
<br>
<br>
</blockquote>
<br>
<br>
<br>
[cut]<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</blockquote>
<br>
<br>
</div>
</div>
</div>
<br>
_______________________________________________<br>
vfio-users mailing list<br>
<a href="mailto:vfio-users@redhat.com" target="_blank">vfio-users@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/vfio-users" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/vfio-users</a><br>
<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
vfio-users mailing list
<a href="mailto:vfio-users@redhat.com" target="_blank">vfio-users@redhat.com</a>
<a href="https://www.redhat.com/mailman/listinfo/vfio-users" target="_blank">https://www.redhat.com/mailman/listinfo/vfio-users</a>
</pre>
</blockquote>
<br>
</div></div></div>
<br>_______________________________________________<br>
vfio-users mailing list<br>
<a href="mailto:vfio-users@redhat.com">vfio-users@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/vfio-users" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/vfio-users</a><br>
<br></blockquote></div><br></div>