vm vcpu uses isolated cores

Thu Sep 2 09:00:31 UTC 2021

On Thu, Sep 02, 2021 at 10:24:08AM +0200, Martin Kletzander wrote:
> On Thu, Sep 02, 2021 at 10:44:06AM +0800, Jiatong Shen wrote:
> > Hello,
> > 
> >   I am trying to understand why qemu vm CPU threads uses isolated cpus.
> > 
> >  I have a host which isolates some cpus using isolcpu
> > like isolcpus=1,2,3,4,5,7,8,9,10,11. unfortunately, vcpupin does not mask
> > out these cpus (vcpupin is still something like ffffffff).
> 
> That is because there are use cases which need this.  They isolate cpus
> to be used by VM only (sometimes even moving kernel workload from these
> cpus) and automatically removing isolcpus from the set would break this
> valid behaviour.
> 
> > When I log in to the system, seems qemu cpu thread only runs on these
> > isolcpus. I do not quite understand this behavior, because I think by using
> > isolcpu, kernel schedule will exclude these cpus and thus vcpu thread
> > shouldn't use these cores unless taskset explicitly got called.. So my
> > question is how does cpu thread got scheduled on isolated cpus?
> > 
> 
> libvirt sets the affinity for VMs because libvirt itself might not be
> running on all cores and qemu being a child process would otherwise
> inherit the affinity.  We even have this in the documentation and if you
> want to limit the cpus it needs to be defined in the XML.
> 
> It begs the question whether we should somehow coordinate with the
> kernel based on isolcpus, but I am not sure under what circumstances we
> should do that and what is the proper way to do that.  I would suggest
> you file an issue to discuss this further unless someone comes up with a
> clear decision here.

Well if someone is using isolcpus, it is because they want to have some
pretty strict control over what runs on what CPUs. The notion of what
a good default placement would be, is then is quite ill-defined. Do you
want to avoid the isolcpus CPUs mask, because it is being used for non-VM
tasks, or do you want to use the isolcpus CPUs becuase it is intended
for VM tasks.  Further more, if we paid any attention to isolcpus mask,
then the VM XML configuration would no longer result in a reproducable
deployment - semantics would vary based on the invisible isolcpus
setting.

Given all that, if someone is using isolcpus, then I'd really expect
that they set explicit affinity for the VMs too. Note this does not
have to be done at the libvirt level at all. On systemd hosts all
VMs will be placed in /machine.slice, so even if the QEMU processes
have an all-1s affinity mask, the CPU affinity on /machine.slice
will take priority.

IOW, if setting isolcpus, you should also always set /machine.slice
CPU affinity.

Which leads into the final point - the need for isolcpus is widely
mis-understood. The only scenario isolcpus is generally required
is for hard real-time workloads, where you absolutely must stop
all kernel threads running on those CPUs. In any non-real-time
scenario, it is sufficient to "isolate" / "reserve" CPUs using
CPU affinity in cgroups alone. For systemd this can be done
globally using CPUAffinity in /etc/systemd/system.conf, to
restrict most OS services to some house keeping CPUs, and then
using /machine.slice to grant access to other CPUs for VMs.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|