[Crash-utility] [PATCH] runq: make tasks in throttled cfs_rqs/rt_rqs displayed
zhangyanfei
zhangyanfei at cn.fujitsu.com
Fri Nov 9 03:37:11 UTC 2012
于 2012年11月08日 03:15, Dave Anderson 写道:
>
>
> ----- Original Message -----
>>
>> ok. I rewrite the patch and they are tested ok in my box.
>>
>> Thanks
>> Zhang
>
> My tests weren't so successful this time, and I also have some questions
> about the runq -g output.
>
> I tested your latest patches on a sample set of 70 dumpfiles whose
> kernels all use CFS runqueues. In 7 of the 70 "runq -g" tests,
> the command caused the crash session to fail like so:
>
<snip>
>
> In a quick debugging session of your free_task_group_info_array()
> I printed out the addresses being FREEBUF()'d, and I noted that
> there were numerous instances of the same address being free twice:
>
> static void
> free_task_group_info_array(void)
> {
> int i;
>
> for (i = 0; i < tgi_p; i++) {
> if (tgi_array[i]->name)
> FREEBUF(tgi_array[i]->name);
> FREEBUF(tgi_array[i]);
> }
> tgi_p = 0;
> FREEBUF(tgi_array);
> }
>
> I put one of the failing vmlinux/vmcore pairs here for you
> to debug:
>
> http://people.redhat.com/anderson/zhangyanfei
>
This is so weird. In my test on the vmcore you provided, 'runq -g' ran well
for the first time and caused the crash session to fail the next time.
>From the debug information above and from my tests, I noticed that it always
failed on the same place when FREEBUF a name. So I checked the function
get_task_group_name and changed the way to return a name buf. Now the command
works well on the vmcore.
>
> Secondly, another question I have is the meaning of the command's output.
>
> First, consider this "runq" output:
>
> crash> runq
> CPU 0 RUNQUEUE: ffff8800090436c0
> CURRENT: PID: 588 TASK: ffff88007e4877a0 COMMAND: "udevd"
> RT PRIO_ARRAY: ffff8800090437c8
> [no tasks queued]
> CFS RB_ROOT: ffff880009043740
> [118] PID: 2110 TASK: ffff88007d470860 COMMAND: "check-cdrom.sh"
> [118] PID: 2109 TASK: ffff88007f1247a0 COMMAND: "check-cdrom.sh"
> [118] PID: 2114 TASK: ffff88007f20e080 COMMAND: "udevd"
>
> CPU 1 RUNQUEUE: ffff88000905b6c0
> CURRENT: PID: 2113 TASK: ffff88007e8ac140 COMMAND: "udevd"
> RT PRIO_ARRAY: ffff88000905b7c8
> [no tasks queued]
> CFS RB_ROOT: ffff88000905b740
> [118] PID: 2092 TASK: ffff88007d7a4760 COMMAND: "MAKEDEV"
> [118] PID: 1983 TASK: ffff88007e59f140 COMMAND: "udevd"
> [118] PID: 2064 TASK: ffff88007e40f7a0 COMMAND: "udevd"
> [115] PID: 2111 TASK: ffff88007e4278a0 COMMAND: "kthreadd"
> crash>
>
> In the above case, the per-cpu "rq" structure addresses are shown as:
>
> CPU 0 RUNQUEUE: ffff8800090436c0
> CPU 1 RUNQUEUE: ffff88000905b6c0
>
> And embedded in each of the rq structures above are these two rb_root
> structures:
>
> CFS RB_ROOT: ffff880009043740 (embedded in rq @ffff8800090436c0)
> CFS RB_ROOT: ffff88000905b740 (embedded in rq @ffff88000905b6c0)
>
> And starting at those rb_root structures, the tree of tasks are dumped.
>
> Now, your "runq -q" option doesn't show any "starting point" structure
> address, but rather they just show "CPU 0" and "CPU 1":
>
> crash> runq -g
> CPU 0
> CURRENT: PID: 588 TASK: ffff88007e4877a0 COMMAND: "udevd"
> RT PRIO_ARRAY: ffff8800090437c8
> [no tasks queued]
> CFS RB_ROOT: ffff880009093548
> [118] PID: 2110 TASK: ffff88007d470860 COMMAND: "check-cdrom.sh"
> [118] PID: 2109 TASK: ffff88007f1247a0 COMMAND: "check-cdrom.sh"
> [118] PID: 2114 TASK: ffff88007f20e080 COMMAND: "udevd"
>
> CPU 1
> CURRENT: PID: 2113 TASK: ffff88007e8ac140 COMMAND: "udevd"
> RT PRIO_ARRAY: ffff88000905b7c8
> [no tasks queued]
> CFS RB_ROOT: ffff880009093548
> [118] PID: 2092 TASK: ffff88007d7a4760 COMMAND: "MAKEDEV"
> [118] PID: 1983 TASK: ffff88007e59f140 COMMAND: "udevd"
> [118] PID: 2064 TASK: ffff88007e40f7a0 COMMAND: "udevd"
> [115] PID: 2111 TASK: ffff88007e4278a0 COMMAND: "kthreadd"
> crash>
>
> I would think that there might be a useful address of a per-cpu
> structure that could be shown there as well?
OK, this is added.
>
> And secondly, I'm confused as to why the "CFS RB_ROOT" address for
> all cpus is the same address -- for example, above they are both at
> ffff880009093548. How can the two rb trees have the same rb_root?
My neglect, sorry. fixed.
Thanks
Zhang
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-add-g-option-for-runq-v5.patch
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20121109/85dc6ffb/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0002-add-help-info-for-runq-g-v2.patch
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20121109/85dc6ffb/attachment-0001.ksh>
More information about the Crash-utility
mailing list