[Crash-utility] patch for slight modification to runq -g command
Dave Anderson
anderson at redhat.com
Thu Nov 7 21:57:58 UTC 2013
Hi Anthony,
With respect to the nr_running and h_nr_running displays, since you
can "see" the number of tasks queued underneath each particular
group, I'm not convinced that it's worth displaying them?
In your first post you mentioned:
> Since the way we crash the system by messing up the nr_running and h_nr_running,
> so we also display those two fields at the same time. Here’s an example of before and after.
Are you saying that you purposely modify those two values in order to force
a crash?
Anyway, I bring this up because their display is kind of ugly, and also because
in the output logs of my test of your patch, I see this particular instance,
where I've got a 3.6.0 kernel where a crash was generated by entering
"echo c > /proc/sysrq-trigger":
crash> bt
PID: 1212 TASK: ffff880035f60000 CPU: 1 COMMAND: "bash"
#0 [ffff88007831fa20] machine_kexec at ffffffff8103e465
#1 [ffff88007831fa90] crash_kexec at ffffffff810c6658
#2 [ffff88007831fb60] oops_end at ffffffff815d5bf8
#3 [ffff88007831fb90] no_context at ffffffff815c7dae
#4 [ffff88007831fbf0] __bad_area_nosemaphore at ffffffff815c7f98
#5 [ffff88007831fc40] bad_area at ffffffff815c81f0
#6 [ffff88007831fc70] do_page_fault at ffffffff815d87d1
#7 [ffff88007831fd80] page_fault at ffffffff815d5025
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff81388986 RSP: ffff88007831fe38 RFLAGS: 00010092
RAX: 000000000000000f RBX: ffffffff8192dc20 RCX: 00000000000014ff
RDX: 000000000000332f RSI: 0000000000000046 RDI: 0000000000000063
RBP: ffff88007831fe38 R8: ffffffff81b26580 R9: 0000000000000397
R10: 0000000000000002 R11: 0000000000000396 R12: 0000000000000063
R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000007
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88007831fe40] __handle_sysrq at ffffffff813890a7
#9 [ffff88007831fe80] write_sysrq_trigger at ffffffff8138915a
#10 [ffff88007831feb0] proc_reg_write at ffffffff811ea879
#11 [ffff88007831ff00] vfs_write at ffffffff8118991c
#12 [ffff88007831ff30] sys_write at ffffffff81189c4a
#13 [ffff88007831ff80] system_call_fastpath at ffffffff815dcae9
RIP: 00007f64d1a94530 RSP: 00007fffbb0c1248 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffffff815dcae9 RCX: 00000000fbad2a84
RDX: 0000000000000002 RSI: 00007f64d23ab000 RDI: 0000000000000001
RBP: 00007f64d23ab000 R8: 000000000000000a R9: 00007f64d23a4740
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
R13: 00007f64d1d61280 R14: 0000000000000002 R15: 00007f64d1d61280
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>
The "runq -g" output for that cpu looks like this:
CPU 1
CURRENT: PID: 1212 CFS: ffff880035cc2f00 TASK: ffff880035f60000 COMMAND: "bash"
TASK_GROUP RT_RQ: ffff88007fa541e8
RT PRIO_ARRAY: ffff88007fa541e8
[no tasks queued]
TASK_GROUP CFS_RQ: ffff88007fa540f0
CFS RB_ROOT: ffff88007fa54118
GROUP: ffff880078af7800 CFS_RQ: ffff880035cc2f00 RB_ROOT: ffff880035cc2f28 nr_running: 4294967297 h_nr_running: 201908650262921217
[120] PID: 1212 TASK: ffff880035f60000 COMMAND: "bash"
I don't understand where those values are coming from, because if
I look at the CFS_RQ, it shows this:
crash> cfs_rq.nr_running,h_nr_running ffff880035cc2f00
nr_running = 1
h_nr_running = 1
crash>
I also see this occurring on live "snapshot" dumps -- which I understand given
that the kernel's runqueue data structures are being changed while the dump
is being created. But I don't understand why it's happening in the situation
above.
Dave
----- Original Message -----
>
>
> ----- Original Message -----
> > Hi Dave,
> >
> > I have cleaned up the code and added another change.
>
> OK thanks -- the patch runs through my sample set of vmcores with no problem.
>
> > The current running task is not in the rb tree (rb_root), so run -q
> > displays it like:
> >
> > CURRENT: PID: 9048 TASK: ffff8808b07e4200 COMMAND: "actmain"
> > TASK_GROUP RT_RQ: ffff880002493820
> > RT PRIO_ARRAY: ffff880002493820
> > [no tasks queued]
> > TASK_GROUP CFS_RQ: ffff8800024936e0
> > CFS RB_ROOT: ffff880002493710
> > GROUP CFS RB_ROOT: ffff882d609ce830 <TDAT>
> > GROUP CFS RB_ROOT: ffff883f0bcbfa30 <User>
> > [no tasks queued]
> >
> > I can understand why the current running task is not displayed.
> > However, the "-g" option displays all the task_groups the task
> > belongs to but at the end it shows "[no tasks queued]". That is
> > just strange. The new change is to display the task that is running like:
> >
> > CURRENT: PID: 9048 CFS: ffff88039351a800 TASK: ffff8808b07e4200
> > COMMAND: "actmain"
> > TASK_GROUP RT_RQ: ffff880002493820
> > RT PRIO_ARRAY: ffff880002493820
> > [no tasks queued]
> > TASK_GROUP CFS_RQ: ffff8800024936e0
> > CFS RB_ROOT: ffff880002493710
> > GROUP: ffff884052bc9800 CFS_RQ: ffff882d609ce800 RB_ROOT:
> > ffff882d609ce830 <TDAT> nr_running: 1 h_nr_running: 1
> > GROUP: ffff884058f1d000 CFS_RQ: ffff883f0bcbfa00 RB_ROOT:
> > ffff883f0bcbfa30 <User> nr_running: 1 h_nr_running: 1
> > [120] PID: 9048 TASK: ffff8808b07e4200 COMMAND: "actmain"
>
> OK -- I guess I understand why it probably makes sense to duplicate the
> CURRENT task underneath its own GROUP list -- but if that is done, then
> why clutter the CURRENT line with the CFS_RQ address? And it's not clear
> to me why in your example above, the CFS address of ffff88039351a800
> doesn't show up as the CFS_RQ address above the "actmain" line?
>
> Taking a simple example, I see this:
>
> crash> runq -g
> CPU 0
> CURRENT: PID: 0 CFS: ffff88000c7d6aa8 TASK: ffffffff8178ba60 COMMAND:
> "swapper"
> TASK_GROUP RT_RQ: ffff88000c7d6b58
> RT PRIO_ARRAY: ffff88000c7d6b58
> [no tasks queued]
> TASK_GROUP CFS_RQ: ffff88000c7d6aa8
> CFS RB_ROOT: ffff88000c7d6ad0
> [no tasks queued]
>
> CPU 1
> CURRENT: PID: 1268 CFS: ffff88000c9b5aa8 TASK: ffff88002f11c620 COMMAND:
> "bash"
> TASK_GROUP RT_RQ: ffff88000c9b5b58
> RT PRIO_ARRAY: ffff88000c9b5b58
> [no tasks queued]
> TASK_GROUP CFS_RQ: ffff88000c9b5aa8
> CFS RB_ROOT: ffff88000c9b5ad0
> [120] PID: 1268 TASK: ffff88002f11c620 COMMAND: "bash"
>
> crash>
>
> Where the newly-interspersed CFS address redundantly shows the TASK_GROUP
> CFS_RQ
> below. But adding the CFS address to the "swapper" line doesn't seem to make
> much sense, or help in any way, since the idle task is a special case that
> never
> gets queued. And since the CFS address in the "bash" line is redundant with
> the
> TASK_GROUP CFS_RQ below, why bother showing it?
>
> And in a more complicated example, with your patch, the "qemu-kvm" task also
> shows up underneath its group:
>
> CPU 0
> CURRENT: PID: 3144 CFS: ffff88022aab2600 TASK: ffff88022a446040 COMMAND:
> "qemu-kvm"
> TASK_GROUP RT_RQ: ffff880133c16148
> RT PRIO_ARRAY: ffff880133c16148
> [no tasks queued]
> TASK_GROUP CFS_RQ: ffff880133c16028
> CFS RB_ROOT: ffff880133c16058
> GROUP: ffff88012b880800 CFS_RQ: ffff88022ac8f000 RB_ROOT:
> ffff88022ac8f030 <libvirt> nr_running: 1 h_nr_running: 1
> GROUP: ffff88012c078000 CFS_RQ: ffff88022c075000 RB_ROOT:
> ffff88022c075030 <qemu> nr_running: 1 h_nr_running: 1
> GROUP: ffff88012b0fb400 CFS_RQ: ffff88022af94c00 RB_ROOT:
> ffff88022af94c30 <guest1> nr_running: 1 h_nr_running: 1
> GROUP: ffff88022c6bbc00 CFS_RQ: ffff88022aab2600 RB_ROOT:
> ffff88022aab2630 <vcpu1> nr_running: 1 h_nr_running: 1
> [120] PID: 3144 TASK: ffff88022a446040 COMMAND:
> "qemu-kvm"
>
> And note that its interspersed CFS address of ffff88022aab2600 is redundantly
> shown
> as the CFS_RQ of its GROUP down below.
>
> So I don't understand why your example shows different CFS addresses in the
> CURRENT line vs. the GROUP CFS_RQ address above the queued "acctmain" task:
>
> > CURRENT: PID: 9048 CFS: ffff88039351a800 TASK: ffff8808b07e4200
> > COMMAND: "actmain"
> > TASK_GROUP RT_RQ: ffff880002493820
> > RT PRIO_ARRAY: ffff880002493820
> > [no tasks queued]
> > TASK_GROUP CFS_RQ: ffff8800024936e0
> > CFS RB_ROOT: ffff880002493710
> > GROUP: ffff884052bc9800 CFS_RQ: ffff882d609ce800 RB_ROOT:
> > ffff882d609ce830 <TDAT> nr_running: 1 h_nr_running: 1
> > GROUP: ffff884058f1d000 CFS_RQ: ffff883f0bcbfa00 RB_ROOT:
> > ffff883f0bcbfa30 <User> nr_running: 1 h_nr_running: 1
> > [120] PID: 9048 TASK: ffff8808b07e4200 COMMAND: "actmain"
>
> Am I missing something? Or is there cut-and-paste error?
>
> Dave
>
>
More information about the Crash-utility
mailing list