<div dir="ltr"><div><div><div><div>I made the process threw scripts just because cset doesn't work on debian :)<br><br></div>rcu_nocbs should moving much of kernel threads out of the selected cpus.<br><br></div>A good example to know that your vcpus are not used is, for intel, to use turbostat. It show the activity of threads, and the average / top frequency.<br></div>Create your shield, run a hard program. If your selected cpus have 0% usage / 0Mhz average frequency, i think you are good to go :)<br><br></div>For my part, during a compilation of kernel with my shield, threads of core 1,2,3 are at 0 ~ 7 Mhz average.I don't know if the pic of 7Mhz is residual or real...<br><div><br><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><div dir="ltr">--<div>Deldycke Quentin<br></div><div><div><br></div></div></div></div></div>
<br><div class="gmail_quote">On 29 February 2016 at 11:56, Rokas Kupstys <span dir="ltr"><<a href="mailto:rokups@zoho.com" target="_blank">rokups@zoho.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    I tried nohz_full/rcu_nocbs in the past but it did not show any
    visible improvement. I will try again i guess. I however had similar
    setup with cset, although done rather manually.<br>
    <br>
    <blockquote type="cite">cset set -c 0-1 system<span class=""><br>
      cset proc -m -f root -t system -k<br>
    </span></blockquote>
    This essentially moves all tasks to cores 0 and 1. Since libvirt
    uses cpu pinning it stays on cores assigned in xml. However it did
    not show much of an improvement either. One thing that bothers me is
    kernel threads that cant be moved from cores i would like to
    dedicate to VM. Any idea if there is some kernel parameter which
    would prevent kernel thread creation on certain cores but allow
    userspace to run on those same cores? This would be perfect
    substitute for isolcpus param. We could clear cores from any tasks
    when its needed and use them when vm is offline. Cant do that with
    isolcpus.<br>
    <br>
    Ill analyze that repo, seems interesting, thanks for the link.<div><div class="h5"><br>
    <br>
    <div>On 2016.02.29 12:16, Quentin Deldycke
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>
                    <div>
                      <div>Near as efficient as isolcpus, but can be
                        used dynamically, during run:<br>
                        <br>
                      </div>
                      Use nohz_full / rcu_nocbs, to offload all rcu of
                      your vm core to your OS-only cores<br>
                    </div>
                    Use cgroups, when you start vm, you keep only x core
                    to the OS, when you shut it down, let the OS have
                    all cores.<br>
                    <br>
                  </div>
                  If vm is started and you need to have a power boost on
                  linux, just use "echo $$ | sudo tee
                  /cgroups/cgroup.procs", and you will have all cores
                  for program run from this shell :)<br>
                  <br>
                </div>
                Linux only: all core, (but cores 1,2,3 are in nohz mode,
                offloaded by core 0)<br>
              </div>
              Linux + windows: 1 core to linux, 3 core to windows<br>
            </div>
            Need boost on linux: the little command line for this shell<br>
            <br>
            <br>
          </div>
          Example of cgroup usage:<br>
          <a href="https://github.com/qdel/scripts/tree/master/vfio/scripts" target="_blank">https://github.com/qdel/scripts/tree/master/vfio/scripts</a>
          => shieldbuild / shieldbreak<br>
          <br>
        </div>
        Which are called threw qemu hooks:<br>
        <a href="https://github.com/qdel/scripts/tree/master/vfio/hooks" target="_blank">https://github.com/qdel/scripts/tree/master/vfio/hooks</a><br>
        <div><br>
        </div>
        <div>I do not configure my io, i let qemu manage.<br>
          <br>
          <br>
        </div>
        <div>Not one fun behavior:<br>
        </div>
        <div>While idle, i am completely still at ~1000us,<br>
        </div>
        <div>If i run a game, it goes down to a completely still 500us<br>
          <br>
        </div>
        <div>Example: <a href="http://b.qdel.fr/test.png" target="_blank">http://b.qdel.fr/test.png</a><br>
        </div>
        <div><br>
        </div>
        <div>Sorry for quality, vnc to 4k screen from 1080p all this...<br>
        </div>
        <div><br>
        </div>
      </div>
      <div class="gmail_extra"><br clear="all">
        <div>
          <div>
            <div dir="ltr">--
              <div>Deldycke Quentin<br>
              </div>
              <div>
                <div><br>
                </div>
              </div>
            </div>
          </div>
        </div>
        <br>
        <div class="gmail_quote">On 29 February 2016 at 10:55, Rokas
          Kupstys <span dir="ltr"><<a href="mailto:rokups@zoho.com" target="_blank">rokups@zoho.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"> Yes currently i am
              actually booted with vanilla archlinux kernel, no NO_HZ
              and other stuff.<span><br>
                <blockquote type="cite">Why does 2 core for the host is
                  unacceptable? You plan to use it making hard workloads
                  while gaming?</blockquote>
              </span> Problem with isolcpus is that it exempts cores
              from linux cpu scheduler. This means even if VM is offline
              they will stand idle. While i dont do anything on host
              while gaming i do plenty when not gaming and just throwing
              away 6 cores of already disadvantaged AMD cpu is a real
              waste.<span><br>
                <br>
                <blockquote type="cite">This config is not good
                  actually.</blockquote>
              </span> Well.. It indeed looks bad on paper, however it is
              the only one that yields bearable DPC latency. I tried
              what you mentioned, various combinations. Pinning 0,2,4,6
              cores to vm, 1,3 to emulator, 5,7 for io / 1,3,5,7 cores
              to vm, 0,2 to emulator, 4,6 for io / 0,1,2,3 cores to vm,
              4,5 to emulator, 6,7 for io / 4,5,6,7 cores to vm, 0,1 to
              emulator, 2,3 for io. All of them yield terrible latency.<br>
              <br>
              Would be interesting to hear someone who has AMD build,
              how (if) he solved this.
              <div>
                <div><br>
                  <br>
                  <div>On 2016.02.29 11:10, Bronek Kozicki wrote:<br>
                  </div>
                  <blockquote type="cite">Two things you can improve,
                    IMO<br>
                    <div><br>
                      * disable NO_HZ<br>
                      <br>
                      * use isolcpus to dedicate your pinned CPUs to
                      guest only - this<br>
                      will also ensure they are not used for guest  IO.<br>
                      <br>
                      B.<br>
                      <br>
                      On 29/02/2016 08:45, Rokas Kupstys wrote:<br>
                      <br>
                    </div>
                    <br>
                    <blockquote type="cite"><br>
                      <br>
                      Yesterday i figured out my latency problem. All
                      things listed<br>
                      everywhere on internet failed. Last thing i tried
                      was pinning one<br>
                      vcpu to two physical cores and it brought latency
                      down. Now i have<br>
                      FX-8350 CPU which has shared FPU for each two
                      cores so maybe thats<br>
                      why. With just this pinning latency now is most of
                      the time just<br>
                      above 1000μs. However under load latency
                      increases. I threw out<br>
                      iothreads and emulator pinning and it did not
                      affect much.<br>
                      Superior latency could be achieved using
                      isolcpus=2-7, however<br>
                      leaving just two cores to host is unacceptable.
                      With that setting<br>
                      latency was around 500μs without load. Good part
                      is that<br>
                      Battlefield3 no longer lags, although i observed
                      increased loading<br>
                      times on textures compared to bare metal. Not so
                      good part is that<br>
                      there still is minor sound skipping/cracking since
                      latency is<br>
                      spiking up under load. That is very disappointing.
                      I also tried<br>
                      performance with two VM cores pinned to 4 host
                      cores - bf3 lagged<br>
                      enough to be unplayable. 3 vm cores pinned to 6
                      host cores was<br>
                      already playable but sound was still cracking. I
                      noticed little<br>
                      difference between that and 4 vm cores pinned to 8
                      host cores. Be<br>
                      nice if sound could be cleaned up. If anyone have
                      any ideas im all<br>
                      ears. Libvirt xml i use now:<br>
                      <br>
                      <br>
                      <br>
                      <blockquote type="cite">  <vcpu<br>
                        placement='static'>4</vcpu><br>
                        <br>
                          <cputune><br>
                        <br>
                            <vcpupin vcpu='0' cpuset='0-1'/><br>
                        <br>
                            <vcpupin vcpu='1' cpuset='2-3'/><br>
                        <br>
                            <vcpupin vcpu='2' cpuset='4-5'/><br>
                        <br>
                            <vcpupin vcpu='3' cpuset='6-7'/><br>
                        <br>
                          </cputune><br>
                        <br>
                          <features><br>
                        <br>
                            <acpi/><br>
                        <br>
                            <apic/><br>
                        <br>
                            <pae/><br>
                        <br>
                            <hap/><br>
                        <br>
                            <viridian/><br>
                        <br>
                            <hyperv><br>
                        <br>
                              <relaxed state='on'/><br>
                        <br>
                              <vapic state='on'/><br>
                        <br>
                              <spinlocks state='on'
                        retries='8191'/><br>
                        <br>
                            </hyperv><br>
                        <br>
                            <kvm><br>
                        <br>
                              <hidden state='on'/><br>
                        <br>
                            </kvm><br>
                        <br>
                            <pvspinlock state='on'/><br>
                        <br>
                          </features><br>
                        <br>
                          <cpu mode='host-passthrough'><br>
                        <br>
                            <topology sockets='1' cores='4'
                        threads='1'/><br>
                        <br>
                          </cpu><br>
                        <br>
                          <clock offset='utc'><br>
                        <br>
                            <timer name='rtc'
                        tickpolicy='catchup'/><br>
                        <br>
                            <timer name='pit' tickpolicy='delay'/><br>
                        <br>
                            <timer name='hpet' present='no'/><br>
                        <br>
                            <timer name='hypervclock'
                        present='yes'/><br>
                        <br>
                          </clock><br>
                        <br>
                        <br>
                        <br>
                      </blockquote>
                      <br>
                      Kernel configs<br>
                      <br>
                      <blockquote type="cite"> CONFIG_NO_HZ_FULL=y<br>
                        <br>
                        CONFIG_RCU_NOCB_CPU_ALL=y<br>
                        <br>
                        CONFIG_HZ_1000=y<br>
                        <br>
                        CONFIG_HZ=1000</blockquote>
                      <br>
                      I am not convinced 1000 hz tickrate is needed.
                      Default one (300)<br>
                      seems to perform equally as well from looking at
                      latency charts.<br>
                      Did not get chance to test it with bf3 yet
                      however.<br>
                      <br>
                      <br>
                      <br>
                      <br>
                      <br>
                      <div>On 2016.01.12 11:12, thibaut noah<br>
                        wrote:<br>
                        <br>
                      </div>
                      <br>
                      <br>
                      <br>
                    </blockquote>
                    <br>
                    <br>
                    <br>
                    [cut]<br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                  </blockquote>
                  <br>
                  <br>
                </div>
              </div>
            </div>
            <br>
            _______________________________________________<br>
            vfio-users mailing list<br>
            <a href="mailto:vfio-users@redhat.com" target="_blank">vfio-users@redhat.com</a><br>
            <a href="https://www.redhat.com/mailman/listinfo/vfio-users" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/vfio-users</a><br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
vfio-users mailing list
<a href="mailto:vfio-users@redhat.com" target="_blank">vfio-users@redhat.com</a>
<a href="https://www.redhat.com/mailman/listinfo/vfio-users" target="_blank">https://www.redhat.com/mailman/listinfo/vfio-users</a>
</pre>
    </blockquote>
    <br>
  </div></div></div>

<br>_______________________________________________<br>
vfio-users mailing list<br>
<a href="mailto:vfio-users@redhat.com">vfio-users@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/vfio-users" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/vfio-users</a><br>
<br></blockquote></div><br></div>