[libvirt] [RFC][PATCH] lxc: fix for ns cgroups subsystem

Ryota Ozaki ozaki.ryota at gmail.com
Mon May 11 09:01:07 UTC 2009


Hi,

On Mon, May 11, 2009 at 10:26 AM, Ryota Ozaki <ozaki.ryota at gmail.com> wrote:
> Hi,
>
> On Sun, May 10, 2009 at 8:49 AM, Ryota Ozaki <ozaki.ryota at gmail.com> wrote:
>> Hi Serge and Daniel,
>>
>> On Sat, May 9, 2009 at 4:03 AM, Serge E. Hallyn <serue at us.ibm.com> wrote:
>>> Quoting Daniel P. Berrange (berrange at redhat.com):
>>>> On Fri, May 08, 2009 at 08:34:12AM -0500, Serge E. Hallyn wrote:
>>>> > Quoting Ryota Ozaki (ozaki.ryota at gmail.com):
>>>> > > Hi Serge,
>>>> > >
>>>> > > On Fri, May 8, 2009 at 11:48 AM, Serge E. Hallyn <serue at us.ibm.com> wrote:
>>>> > > > IIUC, the real problem is that src/cgroup.c assumes that the
>>>> > > > cgroup name should be $CGROUP_MOUNTPOINT/groupname.  But of
>>>> > > > course if the ns cgroup is enabled, then the unshare(CLONE_NEWNS)
>>>> > > > to create a new namespace in which to mount the new devpts
>>>> > > > locks the driver under $CGROUP_MOUNTPOINT/<pid_of_driver>/
>>>> > > > or somesuch.
>>>> > > >
>>>> > > > If this fixes the problem I have no objections, but it seems
>>>> > > > more fragile than perhaps trying to teach src/cgroup.c to
>>>> > > > consider it's current cgroup as a starting point.
>>>> > >
>>>> > > hmm, I don't know why the assumption is bad and how the approach
>>>> > > you are suggesting helps the ns problem.
>>>> >
>>>> > To be clear, the asssumption is that the driver starts in the
>>>> > root cgroup, i.e. it's pid is listed in $CGROUP_MOUNTPOINT/tasks.
>>>> > And that it can create $CGROUP_MOUNTPOINT/groupname and move
>>>> > itself into $CGROUP_MOUNTPOINT/groupname/tasks.
>>>> >
>>>> > So, the assumption is bad because when the driver does a
>>>> > unshare(CLONE_NEWNS), it gets moved into $CGROUP_MOUNTPOINT/X,
>>>> > and after that can only move itself into
>>>> > $CGROUP_MOUNTPOINT/X/groupname.
>>>> >
>>>> > Even with your patch, it's possible for the lxc driver to have
>>>> > been started under say $CGROUP_MOUNTPOINT/libvir or
>>>> > $CGROUP_MOUNTPOINT/<username> through libcgroup/PAM for instance,
>>>> > in which case your patch would be insufficient.
>>>>
>>>> Indeed, we can't assume we're in the root group at any time. Our
>>>> general plan is that libvirt itself uses 3 levels of cgroup
>>>> starting at the cgroup that libvirtd was placed in by the admin
>>>> of the host, then a per-driver group, then per-guest groups
>>>>
>>>>   $LIBVIRTD_ROOT_CGROUP
>>>>      |
>>>>      +- lxc
>>>>      |    |
>>>>      |    +- LXC-GUEST-1
>>>>      |    +- LXC-GUEST-2
>>>>      |    +- LXC-GUEST-3
>>>>      |    +- ...
>>>>      |
>>>>      +- qemu
>>>>           |
>>>>           +- QEMU-GUEST-1
>>>>           +- QEMU-GUEST-2
>>>>           +- QEMU-GUEST-3
>>>>           +- ...
>>>>
>>>> $LIBVIRTD_ROOT_CGROUP, may be the actaul root mount point for
>>>> the cgroup controller in question, or it may be a sub-directory
>>>> that the admin chose to put it in. Also, if running libvirtd
>>>> as a normal user, the admin may have created per-user account
>>>> cgroups and so libvirtd would end up in there. So we have to
>>>> be prepared for anything wrt initial libvirtd cgroup root.
>>>>
>>>> NB The code for putting QEMU in a cgroup isn't merged yet.
>>>
>>> That sounds good.  I just don't think the code currently does
>>> it.  To do the right thing, IIUC, virCgroupPathOfGroup() should
>>> parse the /proc/pid/cgroup contents of the 'controller' process,
>>> and insert that between the virCgroupGetMount(controller)
>>> result and the group name.
>>>
>>> Or something...
>>>
>>> (right?)
>>>
>>> thanks,
>>> -serge
>>>
>>
>> OK I probably understood the problem and wrote a patch for that
>> according to the Serge's suggestion.
>>
>> I expect this patch fixes the problem, however unfortunately
>> I've not tried it yet under the situation you are worrying and
>> just done regression tests. Because I don't know easy way to
>> produce the situation. If you know that, please let me know...
>
> I found a way to go. lxc-unshare helps me and unveils my second
> path is broken...
>
>  ozaki-r
>
>>
>> Thanks,
>>  ozaki-r
>

I've updated the patch. The change includes support for multiple mount
points of cgroups that I didn't cope with in the previous patch.

Through the work, I found a bit messy problem. Current lxc controller writes
pid in a 'tasks' file multiple times if one mount point has multiple subsystems.
It is bad because the first write changes the cgroups path of a controller, and
then the second write points a missing file like
$CGROUPS_MOUNTPOINT/<path_to_domain>/<path_to_domain>/tasks where
the correct file is $CGROUPS_MOUNTPOINT/<path_to_domain>/tasks.
I did workaround this problem with a tricky way by truncating the
duplicated path.
We probably need a more feasible solution.

Note that this patch intends a proof of concept to be clear the correct way
to go and thus it remains a couple of awkward codes. Of course the codes
should be removed if we find the correct way.

Thanks,
  ozaki-r

>From 473183a77fbdb002d55acfed0d8f9bd485f548cd Mon Sep 17 00:00:00 2001
From: Ryota Ozaki <ozaki.ryota at gmail.com>
Date: Mon, 11 May 2009 15:18:47 +0900
Subject: [PATCH] [v2] cgroups: address libvirtd's bad assumption for
cgroups hierarchy

---
 src/cgroup.c |   87 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/src/cgroup.c b/src/cgroup.c
index 50517e2..58ba588 100644
--- a/src/cgroup.c
+++ b/src/cgroup.c
@@ -117,12 +117,76 @@ int virCgroupHaveSupport(void)
     return 0;
 }

+static int virCgroupGetCurrentPath(int pid, const char *controller,
char **curpath)
+{
+    char buf[(PATH_MAX+256)*9]; /* there are nine subsystems */
+    char *path = NULL;
+    int rc = 0;
+    int len;
+    int fd;
+    char *ctrlpos = NULL;
+    char *pathpos = NULL;
+    char *brkpos = NULL;
+
+    if (virAsprintf(&path, "/proc/%d/cgroup", pid) == -1) {
+        rc = -ENOMEM;
+        goto error;
+    }
+
+    fd = open(path, O_RDONLY);
+    if (fd < 0) {
+        DEBUG("Unable to open %s: %m", path);
+        rc = -errno;
+        goto error_open;
+    }
+
+    len = saferead(fd, buf, sizeof(buf));
+    if (len <= 0) {
+        rc = -errno;
+        goto error_saferead;
+    }
+
+    ctrlpos = strstr(buf, controller);
+    if (ctrlpos == NULL) {
+        rc = -1;
+        goto error_strstr;
+    }
+
+    pathpos = strstr(ctrlpos, ":/");
+    if (pathpos == NULL) {
+        rc = -1;
+        goto error_strstr;
+    }
+    pathpos += 2; /* skip ":/" */
+
+    brkpos = strstr(ctrlpos, "\n");
+    if (brkpos == NULL) {
+        rc = -1;
+        goto error_strstr;
+    }
+    *brkpos = '\0'; /* overwrite '\n' */
+
+    if (virAsprintf(curpath, "%s", pathpos) == -1) {
+        rc = -ENOMEM;
+    }
+
+error_strstr:
+error_saferead:
+    close(fd);
+error_open:
+    VIR_FREE(path);
+error:
+    return rc;
+}
+
 static int virCgroupPathOfGroup(const char *group,
                                 const char *controller,
                                 char **path)
 {
     virCgroupPtr root = NULL;
+    char *curpath = NULL;
     int rc = 0;
+    char *pos;

     root = virCgroupGetMount(controller);
     if (root == NULL) {
@@ -130,8 +194,20 @@ static int virCgroupPathOfGroup(const char *group,
         goto out;
     }

-    if (virAsprintf(path, "%s/%s", root->path, group) == -1)
+    rc = virCgroupGetCurrentPath(getpid(), controller, &curpath);
+    if (rc < 0)
+        goto out;
+
+    /* XXX: remove duplicated path */
+    pos = strstr(curpath, group);
+    if (pos) {
+        *pos = '\0';
+    }
+
+    if (virAsprintf(path, "%s/%s/%s", root->path, curpath, group) == -1)
         rc = -ENOMEM;
+
+    VIR_FREE(curpath);
 out:
     virCgroupFree(&root);

@@ -145,6 +221,7 @@ static int virCgroupPathOf(const char *grppath,
     virCgroupPtr root;
     int rc = 0;
     char *controller = NULL;
+    char *curpath = NULL;

     if (strchr(key, '.') == NULL)
         return -EINVAL;
@@ -158,8 +235,14 @@ static int virCgroupPathOf(const char *grppath,
         goto out;
     }

-    if (virAsprintf(path, "%s/%s/%s", root->path, grppath, key) == -1)
+    rc = virCgroupGetCurrentPath(getpid(), controller, &curpath);
+    if (rc < 0)
+        goto out;
+
+    if (virAsprintf(path, "%s/%s/%s/%s", root->path, curpath,
grppath, key) == -1)
         rc = -ENOMEM;
+
+    VIR_FREE(curpath);
 out:
     virCgroupFree(&root);
     VIR_FREE(controller);
-- 
1.6.0.6




More information about the libvir-list mailing list