[Crash-utility] [PATCHv2] Add the proccgroup extension

nborisov n.borisov.lkml at gmail.com
Thu Apr 14 16:57:49 UTC 2016



On 13.04.2016 23:27, Dave Anderson wrote:
> 
> 
> ----- Original Message -----
>> Initial version of a crash module which can be used to show which cgroups
>> is a process member of.
>>
>> Signed-off-by: Nikolay Borisov <n.borisov.lkml at gmail.com>
>> ---
>>
>> So here is the second version of the proccgroup module. Changes since v1:
>>
>>  * Now show the full path to the cgroup (limited to 4k long paths).
>>  * Added support for passing either pid or hex address of task struct, so hat
>>    cgroup info can be acquired for an arbitrary task
>>  * Added support for pre-3.15 kernels
>>  * Removed leftovers from the echo module
> 
> 
> Hello Nikolay,
> 
> While cgroups have existed since 2.6.24, it appears that cgroup.name
> was introduced in 3.10, and cgroup.kn in 3.15.  So I have only a 
> limited set of sample 3.10+ dumpfiles that I could test it on. 
> 
> I have many 3.10-based RHEL7 kernels, and the same error occurs on 
> all of them:
>   
>   crash> sys | grep RELEASE
>        RELEASE: 3.10.0-327.el7.x86_64
>   crash> showcg
>   showcg: invalid kernel virtual address: ff88046666e03060  type: "cgroup_subsys->name"
>   crash> 
> 
> The bad address looks to come from this line:
> 
>     readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"), KVADDR, &cgroup_subsys_ptr, sizeof(void *),
>             "cgroup_subsys_state->ss", FAULT_ON_ERROR);
> 
> because the 3.10 kernel does not have a cgroup_subsys_state.ss field, which was
> added in 4.2:

It was actually added to 3.12 .

> 
>   crash> cgroup_subsys_state
>   struct cgroup_subsys_state {
>       struct cgroup *cgroup;
>       atomic_t refcnt;
>       unsigned long flags;
>       struct css_id *id;
>       struct work_struct dput_work;
>   }
>   SIZE: 64
>   crash>
> 
> Unfortunately you don't have the benefit of being able to use OFFSET(), which
> would fail immediately.  MEMBER_OFFSET() returns -1 on invalid requests, so you
> really have to verify the return value, or add it to your MEMBER_OFFSET() verifications
> during your init function. 

I guess on pre-3.12 kernels I will just skip printing the name of the
subsystem. I will take a brief look whether I could recreate the logic
in the module rather than relying on traversing structs but I don't
consider this high priority.


> 
> And there were these oddities on later kernel versions:
>   
> All 3 of my sample 3.13-based Fedora kernels result in this output: 
> 
>   crash> sys | grep RELEASE
>        RELEASE: 3.13.0-0.rc1.git2.1.fc20.x86_64
>   crash> showcg
>   subsys: cpuset               cgroup: /
>   subsys: cpu                  cgroup: /
>   subsys: cpuacct              cgroup: /
>   subsys: memory               cgroup: /
>   subsys: devices              cgroup: /
>   subsys: freezer              cgroup: /
>   subsys: net_cls              cgroup: /
>   subsys: blkio                cgroup: /
>   subsys: perf_event           cgroup: /
>   subsys: hugetlb              cgroup: /
>   showcg: invalid kernel virtual address: 0  type: "cgroup_subsys_state->cgroup"
>   crash> 
> 
> I didn't look into why they all end that way.  Maybe there's a NULL pointer in the
> last entry in the subsys array?

I will have to test this on a 3.13 kernel .

>   
> And lastly, I only have one 3.14-based kernel, which shows this:
> 
>   crash> sys | grep RELEASE
>        RELEASE: 3.14.0-rc1+
>   crash> showcg
>   showcg: zero-size memory allocation! (called from 7f3280273719)
>   crash> 
> 
> which would come a cgroup_subsys_arr value of 0 from here
> 
>     en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") / sizeof(void *);
>     cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong));
> 
> which depends upon CGROUP_SUBSYS_COUNT being something non-zero:
>        /*
>          * Set of subsystem states, one for each subsystem. This array is
>          * immutable after creation apart from the init_css_set during
>          * subsystem registration (at boot time).
>          */
>         struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
> 
> And in that kernel apparently CONFIG_GROUPS was not configured and
> therefore CGROUP_SUBSYS_COUNT is 0:

But there is already logic in the initialization routine which should
handle cases where CONFIG_CGROUP is not selected, simply by checking
whether the "cgroups" member in task_struct exists. I checked on LXR and
this member has always been protected by #ifdef CONFIG_CGROUPS. Maybe
this is fedora kernel specific? Can you please take a look in the
definition of task_struct whether the 'cgroups' member is protected by
an ifdef guard? I can easily augment the check to consider the size of
subsys array. I tested the code on 3.12 and on !CONFIG_CGROUPS the
extension correctly bails out.

> 
>   #else   /* CONFIG_CGROUPS */
>   
>   #define CGROUP_SUBSYS_COUNT 0
>   
>   static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk) {}
>   static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) {}
>   
>   #endif  /* CONFIG_CGROUPS */
> 
> making it an empty structure:
>   
>   crash> css_set
>   struct css_set {
>       atomic_t refcount;
>       struct hlist_node hlist;
>       struct list_head tasks;
>       struct list_head cgrp_links;
>       struct cgroup_subsys_state *subsys[];
>       struct callback_head callback_head;
>   }
>   SIZE: 72
>   crash> css_set -o
>   struct css_set {
>      [0] atomic_t refcount;
>      [8] struct hlist_node hlist;
>     [24] struct list_head tasks;
>     [40] struct list_head cgrp_links;
>     [56] struct cgroup_subsys_state *subsys[];
>     [56] struct callback_head callback_head;
>   }
>   SIZE: 72
>   crash>
>   
> The other 3.18 and 4.x based kernels ran the command OK.
> 
> Another thing I might suggest if your idea is to assist in the 
> actual debugging of cgroup problems -- would be to print the
> address of key data structures as part of the command's output.
> That kind of thing is done by most crash commands, so that a user
> can quickly dump, for example, the target cgroup structure, or
> perhaps some of the other structures that would be helpful to 
> fully display.  
> 
> On the other hand, maybe all you're interested in seeing is the
> cgroup name and path?  I don't know -- that's up to you.

For now my intention is to have a quick way to know which cgroup is a
process member of. If someone can provide usecase as to which addresses
might be usefull I will consider adding those.

> 
> Also, you don't have to post your module as a patch to the
> extensions subdirectory.  I'm not going to add the file to the 
> crash sources contained in the tar.gz or src.rpm releases, but
> rather I will post your module source file, and directions on 
> how to build it, on the extensions web page accessible from
> http://people.redhat.com/anderson/extensions.html.  So you can 
> just attach the module's C file to your email to this mailing list.

Ok, will have this in mind in my next posting.

Thanks a lot for the detailed and helpful feedback!

> 
> Thanks,
>   Dave
> 
> 
> 
> 
>>
>>  extensions/proccgroup.c | 278
>>  ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 278 insertions(+)
>>  create mode 100644 extensions/proccgroup.c
>>
>> diff --git a/extensions/proccgroup.c b/extensions/proccgroup.c
>> new file mode 100644
>> index 0000000..aee735b
>> --- /dev/null
>> +++ b/extensions/proccgroup.c
>> @@ -0,0 +1,278 @@
>> +/*
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * Nikolay Borisov <n.borisov.lkml at gmail.com>
>> + */
>> +
>> +#include <stdbool.h>
>> +#include "defs.h"
>> +
>> +#define MAX_CGROUP_PATH 4096
>> +
>> +static void showcgrp(void);
>> +char *help_proc_cgroups[];
>> +
>> +static struct command_table_entry command_table[] = {
>> +        { "showcg", showcgrp, help_proc_cgroups, 0},
>> +        { NULL },
>> +};
>> +
>> +
>> +void __attribute__((constructor))
>> +proccgroup_init(void)
>> +{
>> +
>> +    if (!MEMBER_EXISTS("task_struct", "cgroups") ||
>> +        (!MEMBER_EXISTS("cgroup", "kn") && !MEMBER_EXISTS("cgroup",
>> "name")))
>> +    {
>> +        fprintf(fp, "Unrecognised or disabled cgroup support\n");
>> +        return;
>> +    }
>> +
>> +    register_extension(command_table);
>> +}
>> +
>> +void __attribute__((destructor))
>> +proccgroup_finish(void) { }
>> +
>> +/* Prepends contents of cgroup_name to buf, using start as a pointer
>> + * index into buf
>> + */
>> +static void prepend_string(char *buf, char **start, char *cgroup_name) {
>> +
>> +    int len = strlen(cgroup_name);
>> +    *start -= len;
>> +
>> +    if (*start < buf) {
>> +        error(FATAL, "Cgroup too long to parse\n");
>> +    }
>> +
>> +    memcpy(*start, cgroup_name, len);
>> +
>> +    if (--*start < buf) {
>> +        error(FATAL, "Cgroup too long to parse\n");
>> +    }
>> +
>> +    **start = '/';
>> +}
>> +
>> +/* For post-3.15 kernels */
>> +static void get_cgroup_name_kn(ulong cgroup, char *buf, int buflen)
>> +{
>> +    ulong kernfs_node;
>> +    ulong cgroup_name_ptr;
>> +    ulong kernfs_parent;
>> +    bool slash_prepended = false;
>> +    char cgroup_name[BUFSIZE];
>> +    char *start = buf + buflen - 1;
>> +    *start = '\0'; //null terminate the end
>> +
>> +    /* Get cgroup->kn */
>> +    readmem(cgroup + MEMBER_OFFSET("cgroup", "kn"), KVADDR, &kernfs_node,
>> sizeof(void *),
>> +            "cgroup->kn", FAULT_ON_ERROR);
>> +
>> +    do {
>> +        /* Get kn->name */
>> +        readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "name"), KVADDR,
>> &cgroup_name_ptr, sizeof(void *),
>> +                "kernfs_node->name", FAULT_ON_ERROR);
>> +        /* Get kn->parent */
>> +        readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "parent"),
>> KVADDR, &kernfs_parent, sizeof(void *),
>> +                "kernfs_node->parent", FAULT_ON_ERROR);
>> +
>> +        if (kernfs_parent != 0) {
>> +            read_string(cgroup_name_ptr, cgroup_name, BUFSIZE-1);
>> +            prepend_string(buf, &start, cgroup_name);
>> +            slash_prepended = true;
>> +        } else if (!slash_prepended) {
>> +            if (--start < buf) {
>> +                error(FATAL, "Cgroup too long to parse\n");
>> +            }
>> +            *start = '/';
>> +        }
>> +
>> +        kernfs_node = kernfs_parent;
>> +
>> +    } while(kernfs_parent);
>> +
>> +    memmove(buf, start, buf + buflen - start);
>> +}
>> +
>> +/* For pre-3.15 kernels */
>> +static void get_cgroup_name_old(ulong cgroup, char *buf, size_t buflen)
>> +{
>> +    ulong cgroup_name_ptr;
>> +    ulong cgroup_parent_ptr;
>> +    char cgroup_name[BUFSIZE];
>> +    char *start = buf + buflen - 1;
>> +    *start = '\0'; //null terminate the end
>> +    bool slash_prepended = false;
>> +
>> +    do {
>> +        /* Get cgroup->name */
>> +        readmem(cgroup + MEMBER_OFFSET("cgroup", "name"), KVADDR,
>> &cgroup_name_ptr, sizeof(void *),
>> +                "cgroup->name", FAULT_ON_ERROR);
>> +        /* Get cgroup->parent */
>> +        readmem(cgroup + MEMBER_OFFSET("cgroup", "parent"), KVADDR,
>> &cgroup_parent_ptr, sizeof(void *),
>> +                "cgroup->parent", FAULT_ON_ERROR);
>> +
>> +        read_string(cgroup_name_ptr + MEMBER_OFFSET("cgroup_name", "name"),
>> cgroup_name, BUFSIZE-1);
>> +
>> +        if (cgroup_parent_ptr) {
>> +            prepend_string(buf, &start, cgroup_name);
>> +            slash_prepended = true;
>> +        } else if (!slash_prepended) {
>> +            if (--start < buf)
>> +                break;
>> +            *start = '/';
>> +        }
>> +
>> +        cgroup = cgroup_parent_ptr;
>> +
>> +    } while(cgroup_parent_ptr);
>> +
>> +    memmove(buf, start, buf + buflen - start);
>> +}
>> +
>> +static void get_subsys_name(ulong subsys, char *buf, size_t buflen)
>> +{
>> +    ulong subsys_name_ptr;
>> +    ulong cgroup_subsys_ptr;
>> +
>> +    /* Get cgroup->kn */
>> +    readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"), KVADDR,
>> &cgroup_subsys_ptr, sizeof(void *),
>> +            "cgroup_subsys_state->ss", FAULT_ON_ERROR);
>> +
>> +    readmem(cgroup_subsys_ptr + MEMBER_OFFSET("cgroup_subsys", "name"),
>> KVADDR, &subsys_name_ptr, sizeof(void *),
>> +            "cgroup_subsys->name", FAULT_ON_ERROR);
>> +    read_string(subsys_name_ptr, buf, buflen-1);
>> +}
>> +
>> +static void get_cgroup_name(ulong cgroup, ulong subsys)
>> +{
>> +    char *cgroup_path = GETBUF(MAX_CGROUP_PATH);
>> +    char subsys_name[BUFSIZE];
>> +
>> +    /* Handle the 2 cases of cgroup_name and the kernfs one */
>> +    if (MEMBER_EXISTS("cgroup", "kn")) {
>> +        get_cgroup_name_kn(cgroup, cgroup_path, MAX_CGROUP_PATH);
>> +    } else if (MEMBER_EXISTS("cgroup", "name")) {
>> +        get_cgroup_name_old(cgroup, cgroup_path, MAX_CGROUP_PATH);
>> +    }
>> +
>> +    get_subsys_name(subsys, subsys_name, BUFSIZE);
>> +
>> +    fprintf(fp, "subsys: %-20s cgroup: %s\n", subsys_name, cgroup_path);
>> +
>> +    FREEBUF(cgroup_path);
>> +}
>> +
>> +
>> +void show_proc_cgroups(ulong task_ctx) {
>> +    int en_subsys_cnt;
>> +    int i;
>> +    ulong *cgroup_subsys_arr;
>> +    ulong subsys_base_ptr;
>> +	ulong cgroups_subsys_ptr = 0;
>> +
>> +
>> +    /* Get address of task_struct->cgroups */
>> +    readmem(task_ctx + MEMBER_OFFSET("task_struct", "cgroups"),
>> +                            KVADDR, &cgroups_subsys_ptr, sizeof(void *),
>> +                            "task_struct->cgroups", FAULT_ON_ERROR);
>> +
>> +    subsys_base_ptr = cgroups_subsys_ptr + MEMBER_OFFSET("css_set",
>> "subsys");
>> +    en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") / sizeof(void *);
>> +    cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong));
>> +
>> +    /* Get the contents of the css_set->subsys array */
>> +    readmem(subsys_base_ptr, KVADDR, cgroup_subsys_arr, sizeof(ulong) *
>> en_subsys_cnt,
>> +               "css_set->subsys", FAULT_ON_ERROR);
>> +
>> +    for (i = 0; i < en_subsys_cnt; i++) {
>> +        ulong cgroup;
>> +
>> +        /* Get cgroup_subsys_state -> cgroup */
>> +        readmem(cgroup_subsys_arr[i] + MEMBER_OFFSET("cgroup_subsys_state",
>> "cgroup"),
>> +                KVADDR, &cgroup, sizeof(void *),
>> "cgroup_subsys_state->cgroup", FAULT_ON_ERROR);
>> +
>> +        get_cgroup_name(cgroup, cgroup_subsys_arr[i]);
>> +    }
>> +
>> +    FREEBUF(cgroup_subsys_arr);
>> +}
>> +
>> +
>> +static void showcgrp(void) {
>> +
>> +    ulong value;
>> +    struct task_context *tc;
>> +    ulong task_struct_ptr = 0;
>> +
>> +    while (args[++optind]) {
>> +        if (IS_A_NUMBER(args[optind])) {
>> +                switch (str_to_context(args[optind], &value, &tc))
>> +                {
>> +                case STR_PID:
>> +                    task_struct_ptr = tc->task;
>> +                    ++optind;
>> +                    break;
>> +
>> +                case STR_TASK:
>> +    		        task_struct_ptr = value;
>> +                    ++optind;
>> +                    break;
>> +
>> +                case STR_INVALID:
>> +                    error(FATAL, "invalid task or pid value: %s\n\n",
>> +                            args[optind]);
>> +                    break;
>> +                }
>> +        } else {
>> +            if (argcnt > 1)
>> +                error(FATAL, "invalid task or pid value:
>> %s\n",args[optind]);
>> +            else
>> +                break;
>> +        }
>> +    }
>> +
>> +    if (!task_struct_ptr) {
>> +        task_struct_ptr = CURRENT_TASK();
>> +    }
>> +
>> +    show_proc_cgroups(task_struct_ptr);
>> +}
>> +
>> +char *help_proc_cgroups[] = {
>> +        "showcg",
>> +        "Show which cgroups is a process member of",
>> +        " [task | pid]",
>> +
>> +        " This command prints the cgroup for each subsys that a process is a
>> member of",
>> +        "\nExample",
>> +        "  Show the cgroup for the currently active process:\n",
>> +        "       crash> showcg",
>> +        "       subsys: cpuset               cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: cpu                  cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: cpuacct              cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: blkio                cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: memory               cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: devices              cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: freezer              cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: net_cls              cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: perf_event           cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: net_prio             cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "       subsys: hugetlb              cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> +        "\n  Alternatively you can pass either a pid or a task pointer to
>> show the cgroup the",
>> +        "  respective process is a member of e.g:\n",
>> +        "       crash> showcg 1064\n   OR",
>> +        "       crash> showcg ffff880405711b80",
>> +
>> +
>> +
>> +        NULL
>> +};
>> +
>> +
>> --
>> 2.5.0
>>
>> --
>> Crash-utility mailing list
>> Crash-utility at redhat.com
>> https://www.redhat.com/mailman/listinfo/crash-utility
>>




More information about the Crash-utility mailing list