[libvirt] [REPOST] regarding cgroup v2 support in libvirt

Tejun Heo htejun at fb.com
Fri Oct 21 18:24:27 UTC 2016


Hello, Daniel.

On Fri, Oct 21, 2016 at 11:19:02AM +0100, Daniel P. Berrange wrote:
> The big question I have around cgroup v2 is state of support for all
> controllers that libvirt uses (cpu, cpuacct, cpuset, memory, devices,
> freezer, blkio).  IIUC, not all of these have been ported to cgroup
> v2 setup and the cpu port in particular was rejected by Linux maintainers.
> Libvirt has a general policy that we won't support features that only
> exist in out of tree patches (applies to kernel and any other software
> we build against or use).

I see and that's understandable.  However, I think supporting resource
control through systemd can be a good way of navigating the situation.
The back and forward compatibility issues are handled by systemd
allowing libvirt users to make use of what's available on the system
without burdening libvirt with complications.

> IIRC from earlier discussions, the model for dealing with processes in
> cgroup v2 was quite different. In libvirt we rely on the ability to
> assign different threads within a process to different cgroups, because
> we need to control CPU schedular parameters on different threads in
> QEMU. eg we have vCPU threads, I/O threads and general emulator threads
> each of which get different policies.

How thread granularity will be handled in cgroup v2 is still
contentious but I believe that we'll eventually have something.  I
have always been curious about the QEMU thread control tho.  What
prevents it from using the usual nice level adjustments?  Does it
actually require hierarchical resource distribution?

> When I spoke with Lennart about cgroup v2, way back in Jan, he indicated
> that while systemd can technically work with a system where some
> controllers are mounted as v1, while others are mounted as v2, this
> would not be an officially supported solution. Thus systemd in  Fedora
> was not likely to switch to v2 until all required controllers could use
> v2. I'm not sure if this still corresponds to Lennarts current views, so
> CC'ing him to confirm/deny.

The hybrid mode implemented in systemd uses cgroup v2 for process
management (the "name=systemd" hierarchy) but keeps using v1
hierarchies for all resource control.  For "Delegate=" users, I don't
think it'd matter all that much.  Such users either see all v1
hierarchies for all resource controllers as before or the v2
hierarchy.

> I think from Libvirt POV it would greatly simplify life if we could
> likewise restrict ourselves to dealing with hosts which are exclusively
> v1 or exclusively v2, and not a mixture. ie we can completely isolate
> our codebases for v1 vs v2 management, making it easier to reason about
> and test their correctness, reducing QA testing burden.

I think that's gonna be the case.  People *may* try to mix v1 and v2
hierarchies for resource control manually but supporting the mixture
in any major software project would require a lot of complications
which are difficult to justify.

> I recall that systemd policy for v2 was inteded to be that no app
> should write to cgroup sysfs except for systemd, unless there was
> a sub-tree created with Delegate=yes set on the scope. So this clearly
> means when using v2 we'll have to use the systemd DBus APIs for managing
> cgroups v2 on such hosts.

Hmmm... maybe I'm mistaken but it's also kinda broken without
"Delegate=" on v1 too and we got bitten by that already.  An internal
software assumed that it can branch down from the cgroups that the
target process is in at the time of startup and ended up building
sub-hierarchies at different positions in different hierarchies.
Later somebody launched a systemd service which requested some
resource accounting and systemd ended up relocating processes from
those sub-hierarchies.

On systemd systems, I don't think it makes sense to try to do
sub-hierarchy management directly without telling systemd about it.

The flip side is the same too.  With "Delegate=" set, cgroup v2
doesn't pose any more problems than v1 does.

> > It is true that, as libvirt can be used without systemd, libvirt will
> > probably want its own direct implementation down the line, but I think
> > there are benefits to going through systemd for resource settings in
> > general given that hierarchy setup is already done through systemd
> > when available.
> 
> While it is certainly nice that the vast majority of OS distros have
> switched over to using systemd for init, there's still enough users
> out there that I think we'll need to continue to have libvirt support
> for using sysfs for v2 on non-systemd hosts.

Definitely.

> Any way in summary, we'd like to see v2 support of course, since that
> is clearly the future. The big question is what we do about situation
> wrt not all controllers being supported in v2 - the lack of complete
> conversion is what has stopped me from doing any work in this area
> upto now.

What I'm suggesting now is, if available, to use systemd to set up
resource control up to delegation point.  This also would make control
ownership arbitration between systemd and libvirt easier to solve.
Beyond arbitration point, libvirt can keep doing whatever it has been
doing.  If there are v1 hierarchies, it can keep doing the
subhierarchy management.  If v2, it can ignore it for now until cgroup
v2 and libvirt support for it are ready.

IMHO, this would give a substantial part of resource containment that
people want without libvirt having to deal with the headaches of
transitional period.

Thanks.

-- 
tejun




More information about the libvir-list mailing list