[libvirt] Suboptimal default cpu Cgroup

Fri Aug 15 13:44:50 UTC 2014

On Fri, Aug 15, 2014 at 09:23:35AM -0400, Andrew Theurer wrote:
> 
> > On Thu, Aug 14, 2014 at 01:55:11PM -0400, Andrew Theurer wrote:
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Radim Krčmář" <rkrcmar at redhat.com>
> > > > To: libvir-list at redhat.com
> > > > Cc: "Daniel P. Berrange" <berrange at redhat.com>, "Andrew Theurer"
> > > > <atheurer at redhat.com>
> > > > Sent: Thursday, August 14, 2014 9:25:05 AM
> > > > Subject: Suboptimal default cpu Cgroup
> > > > 
> > > > Hello,
> > > > 
> > > > by default, libvirt with KVM creates a Cgroup hierarchy in 'cpu,cpuacct'
> > > > [1], with 'shares' set to 1024 on every level.  This raises two points:
> > > > 
> > > > 1) Every VM is given an equal amount of CPU time. [2]
> > > >    ($CG/machine.slice/*/shares = 1024)
> > > > 
> > > >    Which means that smaller / less loaded guests are given an advantage.
> > > > 
> > > > 2) All VMs combined are given 1024 shares. [3]
> > > >    ($CG/machine.slice/shares)
> > > > 
> > > >    This is made even worse on RHEL7, by sched_autogroup_enabled = 0, so
> > > >    every other process in the system is given the same amount of CPU as
> > > >    all VMs combined.
> > > > 
> > > > It does not seem to be possible to tune shares and get a good general
> > > > behavior, so the best solution I can see is to disable the cpu cgroup
> > > > and let users do it when needed.  (Keeping all tasks in $CG/tasks.)
> > > 
> > > Could we have each VM's shares be nr_vcpu * 1024, and the share for
> > > $CG/machine.slice be sum of all VM's share?
> > 
> > Realistically libvirt can't change what it does by default for VMs wrt
> > to this cgroups setting, because it would cause an immediate functional
> > change for any who has deployed current libvirt versions & upgrades.
> 
> Is this another way of saying, "we have already set a bad precedent,
> so we need to keep it"?  I am concerned that anyone who may be experiencing
> this problem may be unsure of what is causing it, and is not aware of how
> to fix it.
>  
> > Management apps like oVirt or OpenStack should explicitly set the policy
> > they desire in this respect.
> 
> Shouldn't a user or upper level mgmt have some expectation of sane defaults?
> A user or mgmt app has already specified a preference in the number of vcpus
> -shouldn't that be enough?  Why have this fix need to be pushed to multiple
> upper layers when it can be remedied in just one (libvirt)?  Honestly, I
> don't understand how this even got out the way it is.

If we hadn't already had this behaviour in libvirt for 3+ years then sure
it would be desirable to change it. At this point though, applications
have been exposed to current semantics for a long time and can have setup
usage policies which are relying on this. If we change the defaults we have
a non-negligible risk of causing regressions in behaviour for our existing
userbase.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|