[libvirt] [REPOST] regarding cgroup v2 support in libvirt

Daniel P. Berrange berrange at redhat.com
Fri Oct 21 10:19:02 UTC 2016


On Thu, Oct 20, 2016 at 02:59:45PM -0400, Tejun Heo wrote:
> (reposting w/ libvir-list cc'd, sorry about the delay in reposting,
>  was traveling and then on vacation)
> 
> Hello, Daniel.  How have you been?
> 
> We (facebook) are deploying cgroup v2 and internally use libvirt to
> manage virtual machines, so I'm trying to add cgroup v2 support to
> libvirt.
> 
> Because cgroup v2's resource configurations differ from v1 in varying
> degrees depending on the specific resource type, it unfortunately
> introduces new configurations (some completely new configs, others
> just a different range / format).  This means that adding cgroup v2
> support to libvirt requires adding new config options to it and maybe
> implementing some form of translation mechanism between overlapping
> configs.
> 
> The upcoming systemd release includes all that's necessary to support
> v1/v2 compatibility so that users setting resource configs through
> systemd don't have to worry about whether v1 or v2 is in use.  I'm
> wondering whether it would make sense to make libvirt use dbus calls
> to systemd to set resource configs when systemd is in use, so that it
> can piggyback on systemd's v1/v2 compatibility.

The big question I have around cgroup v2 is state of support for all
controllers that libvirt uses (cpu, cpuacct, cpuset, memory, devices,
freezer, blkio).  IIUC, not all of these have been ported to cgroup
v2 setup and the cpu port in particular was rejected by Linux maintainers.
Libvirt has a general policy that we won't support features that only
exist in out of tree patches (applies to kernel and any other software
we build against or use).

IIRC from earlier discussions, the model for dealing with processes in
cgroup v2 was quite different. In libvirt we rely on the ability to
assign different threads within a process to different cgroups, because
we need to control CPU schedular parameters on different threads in
QEMU. eg we have vCPU threads, I/O threads and general emulator threads
each of which get different policies.

When I spoke with Lennart about cgroup v2, way back in Jan, he indicated
that while systemd can technically work with a system where some
controllers are mounted as v1, while others are mounted as v2, this
would not be an officially supported solution. Thus systemd in  Fedora
was not likely to switch to v2 until all required controllers could use
v2. I'm not sure if this still corresponds to Lennarts current views, so
CC'ing him to confirm/deny.

I think from Libvirt POV it would greatly simplify life if we could
likewise restrict ourselves to dealing with hosts which are exclusively
v1 or exclusively v2, and not a mixture. ie we can completely isolate
our codebases for v1 vs v2 management, making it easier to reason about
and test their correctness, reducing QA testing burden.

I recall that systemd policy for v2 was inteded to be that no app
should write to cgroup sysfs except for systemd, unless there was
a sub-tree created with Delegate=yes set on the scope. So this clearly
means when using v2 we'll have to use the systemd DBus APIs for managing
cgroups v2 on such hosts.

> It is true that, as libvirt can be used without systemd, libvirt will
> probably want its own direct implementation down the line, but I think
> there are benefits to going through systemd for resource settings in
> general given that hierarchy setup is already done through systemd
> when available.

While it is certainly nice that the vast majority of OS distros have
switched over to using systemd for init, there's still enough users
out there that I think we'll need to continue to have libvirt support
for using sysfs for v2 on non-systemd hosts.


Any way in summary, we'd like to see v2 support of course, since that
is clearly the future. The big question is what we do about situation
wrt not all controllers being supported in v2 - the lack of complete
conversion is what has stopped me from doing any work in this area
upto now.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|




More information about the libvir-list mailing list