[Crash-utility] [PATCHv2] Add the proccgroup extension

Dave Anderson anderson at redhat.com
Wed Apr 13 20:27:50 UTC 2016



----- Original Message -----
> Initial version of a crash module which can be used to show which cgroups
> is a process member of.
> 
> Signed-off-by: Nikolay Borisov <n.borisov.lkml at gmail.com>
> ---
> 
> So here is the second version of the proccgroup module. Changes since v1:
> 
>  * Now show the full path to the cgroup (limited to 4k long paths).
>  * Added support for passing either pid or hex address of task struct, so hat
>    cgroup info can be acquired for an arbitrary task
>  * Added support for pre-3.15 kernels
>  * Removed leftovers from the echo module


Hello Nikolay,

While cgroups have existed since 2.6.24, it appears that cgroup.name
was introduced in 3.10, and cgroup.kn in 3.15.  So I have only a 
limited set of sample 3.10+ dumpfiles that I could test it on. 

I have many 3.10-based RHEL7 kernels, and the same error occurs on 
all of them:
  
  crash> sys | grep RELEASE
       RELEASE: 3.10.0-327.el7.x86_64
  crash> showcg
  showcg: invalid kernel virtual address: ff88046666e03060  type: "cgroup_subsys->name"
  crash> 

The bad address looks to come from this line:

    readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"), KVADDR, &cgroup_subsys_ptr, sizeof(void *),
            "cgroup_subsys_state->ss", FAULT_ON_ERROR);

because the 3.10 kernel does not have a cgroup_subsys_state.ss field, which was
added in 4.2:

  crash> cgroup_subsys_state
  struct cgroup_subsys_state {
      struct cgroup *cgroup;
      atomic_t refcnt;
      unsigned long flags;
      struct css_id *id;
      struct work_struct dput_work;
  }
  SIZE: 64
  crash>

Unfortunately you don't have the benefit of being able to use OFFSET(), which
would fail immediately.  MEMBER_OFFSET() returns -1 on invalid requests, so you
really have to verify the return value, or add it to your MEMBER_OFFSET() verifications
during your init function. 

And there were these oddities on later kernel versions:
  
All 3 of my sample 3.13-based Fedora kernels result in this output: 

  crash> sys | grep RELEASE
       RELEASE: 3.13.0-0.rc1.git2.1.fc20.x86_64
  crash> showcg
  subsys: cpuset               cgroup: /
  subsys: cpu                  cgroup: /
  subsys: cpuacct              cgroup: /
  subsys: memory               cgroup: /
  subsys: devices              cgroup: /
  subsys: freezer              cgroup: /
  subsys: net_cls              cgroup: /
  subsys: blkio                cgroup: /
  subsys: perf_event           cgroup: /
  subsys: hugetlb              cgroup: /
  showcg: invalid kernel virtual address: 0  type: "cgroup_subsys_state->cgroup"
  crash> 

I didn't look into why they all end that way.  Maybe there's a NULL pointer in the
last entry in the subsys array?
  
And lastly, I only have one 3.14-based kernel, which shows this:

  crash> sys | grep RELEASE
       RELEASE: 3.14.0-rc1+
  crash> showcg
  showcg: zero-size memory allocation! (called from 7f3280273719)
  crash> 

which would come a cgroup_subsys_arr value of 0 from here:

    en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") / sizeof(void *);
    cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong));

which depends upon CGROUP_SUBSYS_COUNT being something non-zero:
       /*
         * Set of subsystem states, one for each subsystem. This array is
         * immutable after creation apart from the init_css_set during
         * subsystem registration (at boot time).
         */
        struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];

And in that kernel apparently CONFIG_GROUPS was not configured and
therefore CGROUP_SUBSYS_COUNT is 0:

  #else   /* CONFIG_CGROUPS */
  
  #define CGROUP_SUBSYS_COUNT 0
  
  static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk) {}
  static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) {}
  
  #endif  /* CONFIG_CGROUPS */

making it an empty structure:
  
  crash> css_set
  struct css_set {
      atomic_t refcount;
      struct hlist_node hlist;
      struct list_head tasks;
      struct list_head cgrp_links;
      struct cgroup_subsys_state *subsys[];
      struct callback_head callback_head;
  }
  SIZE: 72
  crash> css_set -o
  struct css_set {
     [0] atomic_t refcount;
     [8] struct hlist_node hlist;
    [24] struct list_head tasks;
    [40] struct list_head cgrp_links;
    [56] struct cgroup_subsys_state *subsys[];
    [56] struct callback_head callback_head;
  }
  SIZE: 72
  crash>
  
The other 3.18 and 4.x based kernels ran the command OK.

Another thing I might suggest if your idea is to assist in the 
actual debugging of cgroup problems -- would be to print the
address of key data structures as part of the command's output.
That kind of thing is done by most crash commands, so that a user
can quickly dump, for example, the target cgroup structure, or
perhaps some of the other structures that would be helpful to 
fully display.  

On the other hand, maybe all you're interested in seeing is the
cgroup name and path?  I don't know -- that's up to you.

Also, you don't have to post your module as a patch to the
extensions subdirectory.  I'm not going to add the file to the 
crash sources contained in the tar.gz or src.rpm releases, but
rather I will post your module source file, and directions on 
how to build it, on the extensions web page accessible from
http://people.redhat.com/anderson/extensions.html.  So you can 
just attach the module's C file to your email to this mailing list.

Thanks,
  Dave




> 
>  extensions/proccgroup.c | 278
>  ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 278 insertions(+)
>  create mode 100644 extensions/proccgroup.c
> 
> diff --git a/extensions/proccgroup.c b/extensions/proccgroup.c
> new file mode 100644
> index 0000000..aee735b
> --- /dev/null
> +++ b/extensions/proccgroup.c
> @@ -0,0 +1,278 @@
> +/*
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * Nikolay Borisov <n.borisov.lkml at gmail.com>
> + */
> +
> +#include <stdbool.h>
> +#include "defs.h"
> +
> +#define MAX_CGROUP_PATH 4096
> +
> +static void showcgrp(void);
> +char *help_proc_cgroups[];
> +
> +static struct command_table_entry command_table[] = {
> +        { "showcg", showcgrp, help_proc_cgroups, 0},
> +        { NULL },
> +};
> +
> +
> +void __attribute__((constructor))
> +proccgroup_init(void)
> +{
> +
> +    if (!MEMBER_EXISTS("task_struct", "cgroups") ||
> +        (!MEMBER_EXISTS("cgroup", "kn") && !MEMBER_EXISTS("cgroup",
> "name")))
> +    {
> +        fprintf(fp, "Unrecognised or disabled cgroup support\n");
> +        return;
> +    }
> +
> +    register_extension(command_table);
> +}
> +
> +void __attribute__((destructor))
> +proccgroup_finish(void) { }
> +
> +/* Prepends contents of cgroup_name to buf, using start as a pointer
> + * index into buf
> + */
> +static void prepend_string(char *buf, char **start, char *cgroup_name) {
> +
> +    int len = strlen(cgroup_name);
> +    *start -= len;
> +
> +    if (*start < buf) {
> +        error(FATAL, "Cgroup too long to parse\n");
> +    }
> +
> +    memcpy(*start, cgroup_name, len);
> +
> +    if (--*start < buf) {
> +        error(FATAL, "Cgroup too long to parse\n");
> +    }
> +
> +    **start = '/';
> +}
> +
> +/* For post-3.15 kernels */
> +static void get_cgroup_name_kn(ulong cgroup, char *buf, int buflen)
> +{
> +    ulong kernfs_node;
> +    ulong cgroup_name_ptr;
> +    ulong kernfs_parent;
> +    bool slash_prepended = false;
> +    char cgroup_name[BUFSIZE];
> +    char *start = buf + buflen - 1;
> +    *start = '\0'; //null terminate the end
> +
> +    /* Get cgroup->kn */
> +    readmem(cgroup + MEMBER_OFFSET("cgroup", "kn"), KVADDR, &kernfs_node,
> sizeof(void *),
> +            "cgroup->kn", FAULT_ON_ERROR);
> +
> +    do {
> +        /* Get kn->name */
> +        readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "name"), KVADDR,
> &cgroup_name_ptr, sizeof(void *),
> +                "kernfs_node->name", FAULT_ON_ERROR);
> +        /* Get kn->parent */
> +        readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "parent"),
> KVADDR, &kernfs_parent, sizeof(void *),
> +                "kernfs_node->parent", FAULT_ON_ERROR);
> +
> +        if (kernfs_parent != 0) {
> +            read_string(cgroup_name_ptr, cgroup_name, BUFSIZE-1);
> +            prepend_string(buf, &start, cgroup_name);
> +            slash_prepended = true;
> +        } else if (!slash_prepended) {
> +            if (--start < buf) {
> +                error(FATAL, "Cgroup too long to parse\n");
> +            }
> +            *start = '/';
> +        }
> +
> +        kernfs_node = kernfs_parent;
> +
> +    } while(kernfs_parent);
> +
> +    memmove(buf, start, buf + buflen - start);
> +}
> +
> +/* For pre-3.15 kernels */
> +static void get_cgroup_name_old(ulong cgroup, char *buf, size_t buflen)
> +{
> +    ulong cgroup_name_ptr;
> +    ulong cgroup_parent_ptr;
> +    char cgroup_name[BUFSIZE];
> +    char *start = buf + buflen - 1;
> +    *start = '\0'; //null terminate the end
> +    bool slash_prepended = false;
> +
> +    do {
> +        /* Get cgroup->name */
> +        readmem(cgroup + MEMBER_OFFSET("cgroup", "name"), KVADDR,
> &cgroup_name_ptr, sizeof(void *),
> +                "cgroup->name", FAULT_ON_ERROR);
> +        /* Get cgroup->parent */
> +        readmem(cgroup + MEMBER_OFFSET("cgroup", "parent"), KVADDR,
> &cgroup_parent_ptr, sizeof(void *),
> +                "cgroup->parent", FAULT_ON_ERROR);
> +
> +        read_string(cgroup_name_ptr + MEMBER_OFFSET("cgroup_name", "name"),
> cgroup_name, BUFSIZE-1);
> +
> +        if (cgroup_parent_ptr) {
> +            prepend_string(buf, &start, cgroup_name);
> +            slash_prepended = true;
> +        } else if (!slash_prepended) {
> +            if (--start < buf)
> +                break;
> +            *start = '/';
> +        }
> +
> +        cgroup = cgroup_parent_ptr;
> +
> +    } while(cgroup_parent_ptr);
> +
> +    memmove(buf, start, buf + buflen - start);
> +}
> +
> +static void get_subsys_name(ulong subsys, char *buf, size_t buflen)
> +{
> +    ulong subsys_name_ptr;
> +    ulong cgroup_subsys_ptr;
> +
> +    /* Get cgroup->kn */
> +    readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"), KVADDR,
> &cgroup_subsys_ptr, sizeof(void *),
> +            "cgroup_subsys_state->ss", FAULT_ON_ERROR);
> +
> +    readmem(cgroup_subsys_ptr + MEMBER_OFFSET("cgroup_subsys", "name"),
> KVADDR, &subsys_name_ptr, sizeof(void *),
> +            "cgroup_subsys->name", FAULT_ON_ERROR);
> +    read_string(subsys_name_ptr, buf, buflen-1);
> +}
> +
> +static void get_cgroup_name(ulong cgroup, ulong subsys)
> +{
> +    char *cgroup_path = GETBUF(MAX_CGROUP_PATH);
> +    char subsys_name[BUFSIZE];
> +
> +    /* Handle the 2 cases of cgroup_name and the kernfs one */
> +    if (MEMBER_EXISTS("cgroup", "kn")) {
> +        get_cgroup_name_kn(cgroup, cgroup_path, MAX_CGROUP_PATH);
> +    } else if (MEMBER_EXISTS("cgroup", "name")) {
> +        get_cgroup_name_old(cgroup, cgroup_path, MAX_CGROUP_PATH);
> +    }
> +
> +    get_subsys_name(subsys, subsys_name, BUFSIZE);
> +
> +    fprintf(fp, "subsys: %-20s cgroup: %s\n", subsys_name, cgroup_path);
> +
> +    FREEBUF(cgroup_path);
> +}
> +
> +
> +void show_proc_cgroups(ulong task_ctx) {
> +    int en_subsys_cnt;
> +    int i;
> +    ulong *cgroup_subsys_arr;
> +    ulong subsys_base_ptr;
> +	ulong cgroups_subsys_ptr = 0;
> +
> +
> +    /* Get address of task_struct->cgroups */
> +    readmem(task_ctx + MEMBER_OFFSET("task_struct", "cgroups"),
> +                            KVADDR, &cgroups_subsys_ptr, sizeof(void *),
> +                            "task_struct->cgroups", FAULT_ON_ERROR);
> +
> +    subsys_base_ptr = cgroups_subsys_ptr + MEMBER_OFFSET("css_set",
> "subsys");
> +    en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") / sizeof(void *);
> +    cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong));
> +
> +    /* Get the contents of the css_set->subsys array */
> +    readmem(subsys_base_ptr, KVADDR, cgroup_subsys_arr, sizeof(ulong) *
> en_subsys_cnt,
> +               "css_set->subsys", FAULT_ON_ERROR);
> +
> +    for (i = 0; i < en_subsys_cnt; i++) {
> +        ulong cgroup;
> +
> +        /* Get cgroup_subsys_state -> cgroup */
> +        readmem(cgroup_subsys_arr[i] + MEMBER_OFFSET("cgroup_subsys_state",
> "cgroup"),
> +                KVADDR, &cgroup, sizeof(void *),
> "cgroup_subsys_state->cgroup", FAULT_ON_ERROR);
> +
> +        get_cgroup_name(cgroup, cgroup_subsys_arr[i]);
> +    }
> +
> +    FREEBUF(cgroup_subsys_arr);
> +}
> +
> +
> +static void showcgrp(void) {
> +
> +    ulong value;
> +    struct task_context *tc;
> +    ulong task_struct_ptr = 0;
> +
> +    while (args[++optind]) {
> +        if (IS_A_NUMBER(args[optind])) {
> +                switch (str_to_context(args[optind], &value, &tc))
> +                {
> +                case STR_PID:
> +                    task_struct_ptr = tc->task;
> +                    ++optind;
> +                    break;
> +
> +                case STR_TASK:
> +    		        task_struct_ptr = value;
> +                    ++optind;
> +                    break;
> +
> +                case STR_INVALID:
> +                    error(FATAL, "invalid task or pid value: %s\n\n",
> +                            args[optind]);
> +                    break;
> +                }
> +        } else {
> +            if (argcnt > 1)
> +                error(FATAL, "invalid task or pid value:
> %s\n",args[optind]);
> +            else
> +                break;
> +        }
> +    }
> +
> +    if (!task_struct_ptr) {
> +        task_struct_ptr = CURRENT_TASK();
> +    }
> +
> +    show_proc_cgroups(task_struct_ptr);
> +}
> +
> +char *help_proc_cgroups[] = {
> +        "showcg",
> +        "Show which cgroups is a process member of",
> +        " [task | pid]",
> +
> +        " This command prints the cgroup for each subsys that a process is a
> member of",
> +        "\nExample",
> +        "  Show the cgroup for the currently active process:\n",
> +        "       crash> showcg",
> +        "       subsys: cpuset               cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: cpu                  cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: cpuacct              cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: blkio                cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: memory               cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: devices              cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: freezer              cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: net_cls              cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: perf_event           cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: net_prio             cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "       subsys: hugetlb              cgroup:
> /user.slice/user-1000.slice/session-c1.scope",
> +        "\n  Alternatively you can pass either a pid or a task pointer to
> show the cgroup the",
> +        "  respective process is a member of e.g:\n",
> +        "       crash> showcg 1064\n   OR",
> +        "       crash> showcg ffff880405711b80",
> +
> +
> +
> +        NULL
> +};
> +
> +
> --
> 2.5.0
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 




More information about the Crash-utility mailing list