[libvirt] Suboptimal default cpu Cgroup

Fri Aug 15 14:08:34 UTC 2014

2014-08-15 10:50+0200, Martin Kletzander:
> On Thu, Aug 14, 2014 at 04:25:05PM +0200, Radim Krčmář wrote:
> >Hello,
> >
> >by default, libvirt with KVM creates a Cgroup hierarchy in 'cpu,cpuacct'
> >[1], with 'shares' set to 1024 on every level.  This raises two points:
> >
> >1) Every VM is given an equal amount of CPU time. [2]
> >  ($CG/machine.slice/*/shares = 1024)
> >
> >  Which means that smaller / less loaded guests are given an advantage.
> >
> 
> This is a default with which we do nothing unless the user (or mgmt
> app) wants to.

(I'd argue that the default is to do nothing at all ;)

>                 What you say is true only when there is no spare time
> (the machines need more time than available).  Such overcommit is the
> problem of the user, I'd say.

I don't like that it breaks an assumption that VCPU behaves as a task.

(Complicated systems are hard to operate without consistency and our
 behavior is really punishing for users that don't read everything.)

> >2) All VMs combined are given 1024 shares. [3]
> >  ($CG/machine.slice/shares)
> >
> 
> This is a problem even on system without slices (systemd), because
> there is /machine/cpu.shares == 1024 anyway.

(Thanks, haven't noticed this on my professionally deformed userspace
 choices.)

>                                               Is there a way to
> disable hierarchy in this case (to say cpu.shares=-1 for example)?

Apart from the obvious "don't create what you don't want", probably not,
cpu.shares are clamped by 2 and 2^18.

> Because if not, then it has only limited use (we cannot prepare the
> hierarchy and just write a number in some file when we want to start
> using it).  That's a pity, but there are probably less use cases then
> hundreds of lines of code that would need to be changed in order to
> support this in kernel.

And hierarchy imposes performance degradation as well, so developers
probably never expected we'd create useless cgroups.
(Should be proportional to their depth => having {emulator,vcpu*} by
 default is counterproductive as well.)

Creating the hierarchy on demand is not much harder than writing a
value, especially if we do it through libvirt anyway.

A version of your proposal would extend cgroups with something like
categorization: we could add an "effective control group" variable that
allows scheduler code to start at a point higher in the hierarchy.
Libvirt could continue doing what it does now and performance would
improve without creating too many special cases.
I can see the flame on LKML.

> >  This is made even worse on RHEL7, by sched_autogroup_enabled = 0, so
> >  every other process in the system is given the same amount of CPU as
> >  all VMs combined.
> >
> 
> But sched_autogroup_enabled = 1 wouldn't make it much better, since it
> would group the machines together anyway, right?

Yes, it would be just a bit better for VMs, because other processes
would be grouped as well.

> >It does not seem to be possible to tune shares and get a good general
> >behavior, so the best solution I can see is to disable the cpu cgroup
> >and let users do it when needed.  (Keeping all tasks in $CG/tasks.)
> >
> 
> I agree with you that it's not the best default scenario we can do,
> and maybe not using cgroups until needed would bring us a good
> benefit.  That is for cgroups like cpu and blkio only, I think.

I haven't delved into other cgroups much, but there is a good question
whether we want them :)
  Does $feature do something useful on top of complicating things?