[libvirt] [RFC][PATCH] lxc: fix for ns cgroups subsystem

Fri May 8 13:34:12 UTC 2009

Quoting Ryota Ozaki (ozaki.ryota at gmail.com):
> Hi Serge,
> 
> On Fri, May 8, 2009 at 11:48 AM, Serge E. Hallyn <serue at us.ibm.com> wrote:
> > IIUC, the real problem is that src/cgroup.c assumes that the
> > cgroup name should be $CGROUP_MOUNTPOINT/groupname.  But of
> > course if the ns cgroup is enabled, then the unshare(CLONE_NEWNS)
> > to create a new namespace in which to mount the new devpts
> > locks the driver under $CGROUP_MOUNTPOINT/<pid_of_driver>/
> > or somesuch.
> >
> > If this fixes the problem I have no objections, but it seems
> > more fragile than perhaps trying to teach src/cgroup.c to
> > consider it's current cgroup as a starting point.
> 
> hmm, I don't know why the assumption is bad and how the approach
> you are suggesting helps the ns problem.

To be clear, the asssumption is that the driver starts in the
root cgroup, i.e. it's pid is listed in $CGROUP_MOUNTPOINT/tasks.
And that it can create $CGROUP_MOUNTPOINT/groupname and move
itself into $CGROUP_MOUNTPOINT/groupname/tasks.

So, the assumption is bad because when the driver does a
unshare(CLONE_NEWNS), it gets moved into $CGROUP_MOUNTPOINT/X,
and after that can only move itself into
$CGROUP_MOUNTPOINT/X/groupname.

Even with your patch, it's possible for the lxc driver to have
been started under say $CGROUP_MOUNTPOINT/libvir or
$CGROUP_MOUNTPOINT/<username> through libcgroup/PAM for instance,
in which case your patch would be insufficient.

thanks,
-serge

PS
The point of the ns cgroup is to prevent even privileged tasks in a
resource group from escaping that resource group.  FWIW this can
currently also be done using selinux/smack, and eventually should
be accomplished using user namespaces.  At that point we should
seriously consider removing the movement restriction.