[Crash-utility] [PATCHv2] Add the proccgroup extension
nborisov
n.borisov.lkml at gmail.com
Thu Apr 14 16:57:49 UTC 2016
On 13.04.2016 23:27, Dave Anderson wrote:
>
>
> ----- Original Message -----
>> Initial version of a crash module which can be used to show which cgroups
>> is a process member of.
>>
>> Signed-off-by: Nikolay Borisov <n.borisov.lkml at gmail.com>
>> ---
>>
>> So here is the second version of the proccgroup module. Changes since v1:
>>
>> * Now show the full path to the cgroup (limited to 4k long paths).
>> * Added support for passing either pid or hex address of task struct, so hat
>> cgroup info can be acquired for an arbitrary task
>> * Added support for pre-3.15 kernels
>> * Removed leftovers from the echo module
>
>
> Hello Nikolay,
>
> While cgroups have existed since 2.6.24, it appears that cgroup.name
> was introduced in 3.10, and cgroup.kn in 3.15. So I have only a
> limited set of sample 3.10+ dumpfiles that I could test it on.
>
> I have many 3.10-based RHEL7 kernels, and the same error occurs on
> all of them:
>
> crash> sys | grep RELEASE
> RELEASE: 3.10.0-327.el7.x86_64
> crash> showcg
> showcg: invalid kernel virtual address: ff88046666e03060 type: "cgroup_subsys->name"
> crash>
>
> The bad address looks to come from this line:
>
> readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"), KVADDR, &cgroup_subsys_ptr, sizeof(void *),
> "cgroup_subsys_state->ss", FAULT_ON_ERROR);
>
> because the 3.10 kernel does not have a cgroup_subsys_state.ss field, which was
> added in 4.2:
It was actually added to 3.12 .
>
> crash> cgroup_subsys_state
> struct cgroup_subsys_state {
> struct cgroup *cgroup;
> atomic_t refcnt;
> unsigned long flags;
> struct css_id *id;
> struct work_struct dput_work;
> }
> SIZE: 64
> crash>
>
> Unfortunately you don't have the benefit of being able to use OFFSET(), which
> would fail immediately. MEMBER_OFFSET() returns -1 on invalid requests, so you
> really have to verify the return value, or add it to your MEMBER_OFFSET() verifications
> during your init function.
I guess on pre-3.12 kernels I will just skip printing the name of the
subsystem. I will take a brief look whether I could recreate the logic
in the module rather than relying on traversing structs but I don't
consider this high priority.
>
> And there were these oddities on later kernel versions:
>
> All 3 of my sample 3.13-based Fedora kernels result in this output:
>
> crash> sys | grep RELEASE
> RELEASE: 3.13.0-0.rc1.git2.1.fc20.x86_64
> crash> showcg
> subsys: cpuset cgroup: /
> subsys: cpu cgroup: /
> subsys: cpuacct cgroup: /
> subsys: memory cgroup: /
> subsys: devices cgroup: /
> subsys: freezer cgroup: /
> subsys: net_cls cgroup: /
> subsys: blkio cgroup: /
> subsys: perf_event cgroup: /
> subsys: hugetlb cgroup: /
> showcg: invalid kernel virtual address: 0 type: "cgroup_subsys_state->cgroup"
> crash>
>
> I didn't look into why they all end that way. Maybe there's a NULL pointer in the
> last entry in the subsys array?
I will have to test this on a 3.13 kernel .
>
> And lastly, I only have one 3.14-based kernel, which shows this:
>
> crash> sys | grep RELEASE
> RELEASE: 3.14.0-rc1+
> crash> showcg
> showcg: zero-size memory allocation! (called from 7f3280273719)
> crash>
>
> which would come a cgroup_subsys_arr value of 0 from here
>
> en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") / sizeof(void *);
> cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong));
>
> which depends upon CGROUP_SUBSYS_COUNT being something non-zero:
> /*
> * Set of subsystem states, one for each subsystem. This array is
> * immutable after creation apart from the init_css_set during
> * subsystem registration (at boot time).
> */
> struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
>
> And in that kernel apparently CONFIG_GROUPS was not configured and
> therefore CGROUP_SUBSYS_COUNT is 0:
But there is already logic in the initialization routine which should
handle cases where CONFIG_CGROUP is not selected, simply by checking
whether the "cgroups" member in task_struct exists. I checked on LXR and
this member has always been protected by #ifdef CONFIG_CGROUPS. Maybe
this is fedora kernel specific? Can you please take a look in the
definition of task_struct whether the 'cgroups' member is protected by
an ifdef guard? I can easily augment the check to consider the size of
subsys array. I tested the code on 3.12 and on !CONFIG_CGROUPS the
extension correctly bails out.
>
> #else /* CONFIG_CGROUPS */
>
> #define CGROUP_SUBSYS_COUNT 0
>
> static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk) {}
> static inline void cgroup_threadgroup_change_end(struct task_struct *tsk) {}
>
> #endif /* CONFIG_CGROUPS */
>
> making it an empty structure:
>
> crash> css_set
> struct css_set {
> atomic_t refcount;
> struct hlist_node hlist;
> struct list_head tasks;
> struct list_head cgrp_links;
> struct cgroup_subsys_state *subsys[];
> struct callback_head callback_head;
> }
> SIZE: 72
> crash> css_set -o
> struct css_set {
> [0] atomic_t refcount;
> [8] struct hlist_node hlist;
> [24] struct list_head tasks;
> [40] struct list_head cgrp_links;
> [56] struct cgroup_subsys_state *subsys[];
> [56] struct callback_head callback_head;
> }
> SIZE: 72
> crash>
>
> The other 3.18 and 4.x based kernels ran the command OK.
>
> Another thing I might suggest if your idea is to assist in the
> actual debugging of cgroup problems -- would be to print the
> address of key data structures as part of the command's output.
> That kind of thing is done by most crash commands, so that a user
> can quickly dump, for example, the target cgroup structure, or
> perhaps some of the other structures that would be helpful to
> fully display.
>
> On the other hand, maybe all you're interested in seeing is the
> cgroup name and path? I don't know -- that's up to you.
For now my intention is to have a quick way to know which cgroup is a
process member of. If someone can provide usecase as to which addresses
might be usefull I will consider adding those.
>
> Also, you don't have to post your module as a patch to the
> extensions subdirectory. I'm not going to add the file to the
> crash sources contained in the tar.gz or src.rpm releases, but
> rather I will post your module source file, and directions on
> how to build it, on the extensions web page accessible from
> http://people.redhat.com/anderson/extensions.html. So you can
> just attach the module's C file to your email to this mailing list.
Ok, will have this in mind in my next posting.
Thanks a lot for the detailed and helpful feedback!
>
> Thanks,
> Dave
>
>
>
>
>>
>> extensions/proccgroup.c | 278
>> ++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 278 insertions(+)
>> create mode 100644 extensions/proccgroup.c
>>
>> diff --git a/extensions/proccgroup.c b/extensions/proccgroup.c
>> new file mode 100644
>> index 0000000..aee735b
>> --- /dev/null
>> +++ b/extensions/proccgroup.c
>> @@ -0,0 +1,278 @@
>> +/*
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * Nikolay Borisov <n.borisov.lkml at gmail.com>
>> + */
>> +
>> +#include <stdbool.h>
>> +#include "defs.h"
>> +
>> +#define MAX_CGROUP_PATH 4096
>> +
>> +static void showcgrp(void);
>> +char *help_proc_cgroups[];
>> +
>> +static struct command_table_entry command_table[] = {
>> + { "showcg", showcgrp, help_proc_cgroups, 0},
>> + { NULL },
>> +};
>> +
>> +
>> +void __attribute__((constructor))
>> +proccgroup_init(void)
>> +{
>> +
>> + if (!MEMBER_EXISTS("task_struct", "cgroups") ||
>> + (!MEMBER_EXISTS("cgroup", "kn") && !MEMBER_EXISTS("cgroup",
>> "name")))
>> + {
>> + fprintf(fp, "Unrecognised or disabled cgroup support\n");
>> + return;
>> + }
>> +
>> + register_extension(command_table);
>> +}
>> +
>> +void __attribute__((destructor))
>> +proccgroup_finish(void) { }
>> +
>> +/* Prepends contents of cgroup_name to buf, using start as a pointer
>> + * index into buf
>> + */
>> +static void prepend_string(char *buf, char **start, char *cgroup_name) {
>> +
>> + int len = strlen(cgroup_name);
>> + *start -= len;
>> +
>> + if (*start < buf) {
>> + error(FATAL, "Cgroup too long to parse\n");
>> + }
>> +
>> + memcpy(*start, cgroup_name, len);
>> +
>> + if (--*start < buf) {
>> + error(FATAL, "Cgroup too long to parse\n");
>> + }
>> +
>> + **start = '/';
>> +}
>> +
>> +/* For post-3.15 kernels */
>> +static void get_cgroup_name_kn(ulong cgroup, char *buf, int buflen)
>> +{
>> + ulong kernfs_node;
>> + ulong cgroup_name_ptr;
>> + ulong kernfs_parent;
>> + bool slash_prepended = false;
>> + char cgroup_name[BUFSIZE];
>> + char *start = buf + buflen - 1;
>> + *start = '\0'; //null terminate the end
>> +
>> + /* Get cgroup->kn */
>> + readmem(cgroup + MEMBER_OFFSET("cgroup", "kn"), KVADDR, &kernfs_node,
>> sizeof(void *),
>> + "cgroup->kn", FAULT_ON_ERROR);
>> +
>> + do {
>> + /* Get kn->name */
>> + readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "name"), KVADDR,
>> &cgroup_name_ptr, sizeof(void *),
>> + "kernfs_node->name", FAULT_ON_ERROR);
>> + /* Get kn->parent */
>> + readmem(kernfs_node + MEMBER_OFFSET("kernfs_node", "parent"),
>> KVADDR, &kernfs_parent, sizeof(void *),
>> + "kernfs_node->parent", FAULT_ON_ERROR);
>> +
>> + if (kernfs_parent != 0) {
>> + read_string(cgroup_name_ptr, cgroup_name, BUFSIZE-1);
>> + prepend_string(buf, &start, cgroup_name);
>> + slash_prepended = true;
>> + } else if (!slash_prepended) {
>> + if (--start < buf) {
>> + error(FATAL, "Cgroup too long to parse\n");
>> + }
>> + *start = '/';
>> + }
>> +
>> + kernfs_node = kernfs_parent;
>> +
>> + } while(kernfs_parent);
>> +
>> + memmove(buf, start, buf + buflen - start);
>> +}
>> +
>> +/* For pre-3.15 kernels */
>> +static void get_cgroup_name_old(ulong cgroup, char *buf, size_t buflen)
>> +{
>> + ulong cgroup_name_ptr;
>> + ulong cgroup_parent_ptr;
>> + char cgroup_name[BUFSIZE];
>> + char *start = buf + buflen - 1;
>> + *start = '\0'; //null terminate the end
>> + bool slash_prepended = false;
>> +
>> + do {
>> + /* Get cgroup->name */
>> + readmem(cgroup + MEMBER_OFFSET("cgroup", "name"), KVADDR,
>> &cgroup_name_ptr, sizeof(void *),
>> + "cgroup->name", FAULT_ON_ERROR);
>> + /* Get cgroup->parent */
>> + readmem(cgroup + MEMBER_OFFSET("cgroup", "parent"), KVADDR,
>> &cgroup_parent_ptr, sizeof(void *),
>> + "cgroup->parent", FAULT_ON_ERROR);
>> +
>> + read_string(cgroup_name_ptr + MEMBER_OFFSET("cgroup_name", "name"),
>> cgroup_name, BUFSIZE-1);
>> +
>> + if (cgroup_parent_ptr) {
>> + prepend_string(buf, &start, cgroup_name);
>> + slash_prepended = true;
>> + } else if (!slash_prepended) {
>> + if (--start < buf)
>> + break;
>> + *start = '/';
>> + }
>> +
>> + cgroup = cgroup_parent_ptr;
>> +
>> + } while(cgroup_parent_ptr);
>> +
>> + memmove(buf, start, buf + buflen - start);
>> +}
>> +
>> +static void get_subsys_name(ulong subsys, char *buf, size_t buflen)
>> +{
>> + ulong subsys_name_ptr;
>> + ulong cgroup_subsys_ptr;
>> +
>> + /* Get cgroup->kn */
>> + readmem(subsys + MEMBER_OFFSET("cgroup_subsys_state", "ss"), KVADDR,
>> &cgroup_subsys_ptr, sizeof(void *),
>> + "cgroup_subsys_state->ss", FAULT_ON_ERROR);
>> +
>> + readmem(cgroup_subsys_ptr + MEMBER_OFFSET("cgroup_subsys", "name"),
>> KVADDR, &subsys_name_ptr, sizeof(void *),
>> + "cgroup_subsys->name", FAULT_ON_ERROR);
>> + read_string(subsys_name_ptr, buf, buflen-1);
>> +}
>> +
>> +static void get_cgroup_name(ulong cgroup, ulong subsys)
>> +{
>> + char *cgroup_path = GETBUF(MAX_CGROUP_PATH);
>> + char subsys_name[BUFSIZE];
>> +
>> + /* Handle the 2 cases of cgroup_name and the kernfs one */
>> + if (MEMBER_EXISTS("cgroup", "kn")) {
>> + get_cgroup_name_kn(cgroup, cgroup_path, MAX_CGROUP_PATH);
>> + } else if (MEMBER_EXISTS("cgroup", "name")) {
>> + get_cgroup_name_old(cgroup, cgroup_path, MAX_CGROUP_PATH);
>> + }
>> +
>> + get_subsys_name(subsys, subsys_name, BUFSIZE);
>> +
>> + fprintf(fp, "subsys: %-20s cgroup: %s\n", subsys_name, cgroup_path);
>> +
>> + FREEBUF(cgroup_path);
>> +}
>> +
>> +
>> +void show_proc_cgroups(ulong task_ctx) {
>> + int en_subsys_cnt;
>> + int i;
>> + ulong *cgroup_subsys_arr;
>> + ulong subsys_base_ptr;
>> + ulong cgroups_subsys_ptr = 0;
>> +
>> +
>> + /* Get address of task_struct->cgroups */
>> + readmem(task_ctx + MEMBER_OFFSET("task_struct", "cgroups"),
>> + KVADDR, &cgroups_subsys_ptr, sizeof(void *),
>> + "task_struct->cgroups", FAULT_ON_ERROR);
>> +
>> + subsys_base_ptr = cgroups_subsys_ptr + MEMBER_OFFSET("css_set",
>> "subsys");
>> + en_subsys_cnt = MEMBER_SIZE("css_set", "subsys") / sizeof(void *);
>> + cgroup_subsys_arr = (ulong *)GETBUF(en_subsys_cnt * sizeof(ulong));
>> +
>> + /* Get the contents of the css_set->subsys array */
>> + readmem(subsys_base_ptr, KVADDR, cgroup_subsys_arr, sizeof(ulong) *
>> en_subsys_cnt,
>> + "css_set->subsys", FAULT_ON_ERROR);
>> +
>> + for (i = 0; i < en_subsys_cnt; i++) {
>> + ulong cgroup;
>> +
>> + /* Get cgroup_subsys_state -> cgroup */
>> + readmem(cgroup_subsys_arr[i] + MEMBER_OFFSET("cgroup_subsys_state",
>> "cgroup"),
>> + KVADDR, &cgroup, sizeof(void *),
>> "cgroup_subsys_state->cgroup", FAULT_ON_ERROR);
>> +
>> + get_cgroup_name(cgroup, cgroup_subsys_arr[i]);
>> + }
>> +
>> + FREEBUF(cgroup_subsys_arr);
>> +}
>> +
>> +
>> +static void showcgrp(void) {
>> +
>> + ulong value;
>> + struct task_context *tc;
>> + ulong task_struct_ptr = 0;
>> +
>> + while (args[++optind]) {
>> + if (IS_A_NUMBER(args[optind])) {
>> + switch (str_to_context(args[optind], &value, &tc))
>> + {
>> + case STR_PID:
>> + task_struct_ptr = tc->task;
>> + ++optind;
>> + break;
>> +
>> + case STR_TASK:
>> + task_struct_ptr = value;
>> + ++optind;
>> + break;
>> +
>> + case STR_INVALID:
>> + error(FATAL, "invalid task or pid value: %s\n\n",
>> + args[optind]);
>> + break;
>> + }
>> + } else {
>> + if (argcnt > 1)
>> + error(FATAL, "invalid task or pid value:
>> %s\n",args[optind]);
>> + else
>> + break;
>> + }
>> + }
>> +
>> + if (!task_struct_ptr) {
>> + task_struct_ptr = CURRENT_TASK();
>> + }
>> +
>> + show_proc_cgroups(task_struct_ptr);
>> +}
>> +
>> +char *help_proc_cgroups[] = {
>> + "showcg",
>> + "Show which cgroups is a process member of",
>> + " [task | pid]",
>> +
>> + " This command prints the cgroup for each subsys that a process is a
>> member of",
>> + "\nExample",
>> + " Show the cgroup for the currently active process:\n",
>> + " crash> showcg",
>> + " subsys: cpuset cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: cpu cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: cpuacct cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: blkio cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: memory cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: devices cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: freezer cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: net_cls cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: perf_event cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: net_prio cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + " subsys: hugetlb cgroup:
>> /user.slice/user-1000.slice/session-c1.scope",
>> + "\n Alternatively you can pass either a pid or a task pointer to
>> show the cgroup the",
>> + " respective process is a member of e.g:\n",
>> + " crash> showcg 1064\n OR",
>> + " crash> showcg ffff880405711b80",
>> +
>> +
>> +
>> + NULL
>> +};
>> +
>> +
>> --
>> 2.5.0
>>
>> --
>> Crash-utility mailing list
>> Crash-utility at redhat.com
>> https://www.redhat.com/mailman/listinfo/crash-utility
>>
More information about the Crash-utility
mailing list