[libvirt PATCH v3] cgroup/LXC: Do not condition availability of v2 by controllers

Pavel Hrdina phrdina at redhat.com
Mon Oct 24 13:28:23 UTC 2022


On Mon, Oct 24, 2022 at 12:51:03PM +0000, Eric van Blokland wrote:
> ------- Original Message -------
> On Monday, October 24th, 2022 at 1:54 PM, Pavel Hrdina <phrdina at redhat.com> wrote:
> > 
> > On Sun, Oct 23, 2022 at 02:08:28PM +0200, Eric van Blokland wrote:
> > 
> > > systemd in hybrid mode uses v1 hierarchies for controllers and v2 for
> > > process tracking.
> > > 
> > > The LXC code uses virCgroupAddMachineProcess() to move processes into
> > > appropriate cgroup by manipulating cgroupfs directly. (Note, despite
> > > libvirt also supports talking to systemd directly via
> > > org.freedesktop.machine1 API.)
> > > 
> > > If this path is taken, libvirt/lxc must convince systemd that processes
> > > really belong to new cgroup, i.e. also the tracking v2 hierarchy must
> > > undergo migration too.
> > > 
> > > The current check would evaluate v2 backend as unavailable with hybrid
> > > mode (because there are no available controllers). Simplify the
> > > condition and consider the mounted cgroup2 as sufficient to touch v2
> > > hierarchy.
> > > 
> > > This consequently creates an issue with binding the V2 mount. In hybrid
> > > mode the V2 filesystem may be mounted upon the V1 filesystem. By reversing
> > > the order in which backends are mounted in virCgroupBindMount this problem
> > > is circumvented.
> > > 
> > > Fixes: #182
> > > Signed-off-by: Eric van Blokland mail at ericvanblokland.nl
> > > ---
> > > src/util/vircgroup.c | 8 +++++---
> > > src/util/vircgroupv2.c | 12 ------------
> > > 2 files changed, 5 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> > > index a6a409af3d..48fbcf625a 100644
> > > --- a/src/util/vircgroup.c
> > > +++ b/src/util/vircgroup.c
> > > @@ -2924,9 +2924,11 @@ virCgroupBindMount(virCgroup *group, const char *oldroot,
> > > size_t i;
> > > virCgroup *parent = virCgroupGetNested(group);
> > > 
> > > - for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) {
> > > - if (parent->backends[i] &&
> > > - parent->backends[i]->bindMount(parent, oldroot, mountopts) < 0) {
> > > + /* In hybrid environments, V2 may be mounted over V1.
> > > + * Mount the backends in reverse order. */
> > 
> > 
> > I don't understand what you mean by mounted over?
> > 
> 
> In the hybrid environments I've seen V2 is mounted upon "/unified" in the V1 cgroup filesystem.

What I've seen is that hybrid systemd environments have:

    /sys/fs/cgroup/unified
    /sys/fs/cgroup/memory
    /sys/fs/cgroup/blkio
    ...

but it doesn't mean that it is mounted in V1 cgroup filesystem.

In this case the /sys/fs/cgroup is tmpfs filesystem and there is nothing
mounted over V1. In case you've managed to run into environment where
unified is actually mounted inside cgroup filesystem then there is
something else seriously broken.

> > > + for (i = 1; i <= VIR_CGROUP_BACKEND_TYPE_LAST; i++) {
> > > + if (parent->backends[VIR_CGROUP_BACKEND_TYPE_LAST - i] &&
> > > + parent->backends[VIR_CGROUP_BACKEND_TYPE_LAST - i]->bindMount(parent, oldroot, mountopts) < 0) {
> > > return -1;
> > > }
> > > }
> > > diff --git a/src/util/vircgroupv2.c b/src/util/vircgroupv2.c
> > > index 4c110940cf..0e0c61d466 100644
> > > --- a/src/util/vircgroupv2.c
> > > +++ b/src/util/vircgroupv2.c
> > > @@ -75,22 +75,10 @@ virCgroupV2Available(void)
> > > if (STRNEQ(entry.mnt_type, "cgroup2"))
> > > continue;
> > > 
> > > - /* Systemd uses cgroup v2 for process tracking but no controller is
> > > - * available. We should consider this configuration as cgroup v2 is
> > > - * not available. */
> > > - contFile = g_strdup_printf("%s/cgroup.controllers", entry.mnt_dir);
> > > -
> > > - if (virFileReadAll(contFile, 1024 * 1024, &contStr) < 0)
> > > - goto cleanup;
> > > -
> > > - if (STREQ(contStr, ""))
> > > - continue;
> > > -
> > 
> > 
> > I don't like this at all and IMO this is incorrect fix of the issue you
> > are trying to address. In hybrid mode with systemd the cgroup v2
> > controller is not a real controller. It's something systemd uses for
> > process tracking and some other features. It is owned by systemd and we
> > should not touch it directly at all. We need to use proper systemd APIs
> > to make any changes to that directory or if needed ask systemd to create
> > cgroup with Delegate=yes which in this case is probably also not the
> > correct approach.
> > 
> 
> I must admit I'm a little in over my head here, but if I understand correctly,
> there isn't anything done in the v2 backend in hybrid mode that wouldn't be done 
> in the v2 backend in unified mode. Does systemd behave differently or is the v2
> implementation in error in both hybrid and unified modes?
> 
> Also in theory there could be a controller bound to the v2 hierarchy which would
> activate the v2 backend anyway.  

So org.freedesktop.systemd1.Manager has method called
AttachProcessesToUnit which would be most likely perfect for us for this
specific case but it is not documented and from the commit message it
was intended for internal use only.

I'll ask systemd developers what the state is and if we could use it
otherwise we might need to drop the check like this patch does.

Pavel

> > I know it was already pushed but I'll most likely revert this patch and
> > we should find better and proper solution.
> > 
> > Pavel
> > 
> 
> I'd love to get suggestions for a better solution.
> 
> Eric
> 
> > > ret = true;
> > > break;
> > > }
> > > 
> > > - cleanup:
> > > VIR_FORCE_FCLOSE(mounts);
> > > return ret;
> > > }
> > > --
> > > 2.35.3
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20221024/81399ac0/attachment-0001.sig>


More information about the libvir-list mailing list