[Crash-utility] FW: patch for slight modification to runq -g command

Dave Anderson anderson at redhat.com
Thu Oct 17 19:46:13 UTC 2013



----- Original Message -----
> 
> 
> 
> 
> Hi,
> 
> 
> 
> We’re debugging an in-house application that makes use of hard limit
> extensively. We ran into a lot of timing windows (all our own making) and we
> use runq -g command a lot. The current runq -g display only the RBROOT
> pointer. It really is a bit inconvenient to traverse the task_group
> hierarchy. It would be nice to have the command also display the
> corresponding task_group, cfs_rq pointer (at a minimum). Since the way we
> crash the system by messing up the nr_running and h_nr_running, so we also
> display those two fields at the same time. Here’s an example of before and
> after.
> 
> 
> 
> CPU 4
> 
> CURRENT: PID: 0 TASK: ffff8840668c0380 COMMAND: "swapper"
> 
> TASK_GROUP RT_RQ: ffff8800027d3820
> 
> RT PRIO_ARRAY: ffff8800027d3820
> 
> [no tasks queued]
> 
> TASK_GROUP CFS_RQ: ffff8800027d36e0
> 
> CFS RB_ROOT: ffff8800027d3710
> 
> GROUP CFS RB_ROOT: ffff883ff69bcc30 <TDAT>
> 
> GROUP CFS RB_ROOT: ffff884006290e30 <User>
> 
> GROUP CFS RB_ROOT: ffff88400641c430 <TDWMVP1>
> 
> GROUP CFS RB_ROOT: ffff88400646b030 <ServDown1:0>
> 
> GROUP CFS RB_ROOT: ffff884006492630 <ServOrder1:1>
> 
> GROUP CFS RB_ROOT: ffff883ff058fe30 <TDWMWD57> (THROTTLED)
> 
> GROUP CFS RB_ROOT: ffff88047889ee30 <S:WD:3d:35fa>
> 
> [120] PID: 27655 TASK: ffff8805078ce2c0 COMMAND: "actmain"
> 
> <<< more throttled groups removed >>>
> 
> 
> 
> CPU 4
> 
> CURRENT: PID: 0 CFS: ffff8800027d36e0 TASK: ffff8840668c0380 COMMAND:
> "swapper"
> 
> TASK_GROUP RT_RQ: ffff8800027d3820
> 
> RT PRIO_ARRAY: ffff8800027d3820
> 
> [no tasks queued]
> 
> TASK_GROUP CFS_RQ: ffff8800027d36e0
> 
> CFS RB_ROOT: ffff8800027d3710
> 
> GROUP: ffff88405394d000 CFS: ffff883ff69bcc00 RB_ROOT: ffff883ff69bcc30
> <TDAT> (0 0)
> 
> GROUP: ffff88405906c400 CFS: ffff884006290e00 RB_ROOT: ffff884006290e30
> <User> (0 0)
> 
> GROUP: ffff884055081000 CFS: ffff88400641c400 RB_ROOT: ffff88400641c430
> <TDWMVP1> (0 0)
> 
> GROUP: ffff884055081c00 CFS: ffff88400646b000 RB_ROOT: ffff88400646b030
> <ServDown1:0> (0 0)
> 
> GROUP: ffff8840580fd400 CFS: ffff884006492600 RB_ROOT: ffff884006492630
> <ServOrder1:1> (0 0)
> 
> GROUP: ffff884058f58c00 CFS: ffff883ff058fe00 RB_ROOT: ffff883ff058fe30
> <TDWMWD57> (7 9) (THROTTLED)
> 
> GROUP: ffff8808fb976000 CFS: ffff88047889ee00 RB_ROOT: ffff88047889ee30
> <S:WD:3d:35fa> (1 1)
> 
> [120] PID: 27655 TASK: ffff8805078ce2c0 COMMAND: "actmain"
> 
> <<< more throttled groups removed >>>
> 
> 
> 
> I have attached the patch we use to display additional information. Could you
> please take a look at my proposal to see if it is possible that you include
> this kind of display format.
> 
> 
> 
> Thanks,
> 
> Anthony

A couple quick observations...

Adding the CFS: address makes sense, but besides you and me and whoever
reads the patch/code, how would anybody know what the two numbers inside
the parentheses mean?  It could be documented in the help page, but I
wonder if it could be made more obvious somehow?

Anyway, I ran a test on a sample set of vmcores, and this addition
will only work with relevant kernel versions.  For example, in these
four dumpfiles (and therefore their kernel versions), the command
fails because the cfs_rq.h_nr_running member does not exist:

2.6.38.2-9.fc15:

CPU 0
  CURRENT: PID: 1180  CFS: ffff880037ef1b00 TASK: ffff88003bea2e40  COMMAND: "crash"
  TASK_GROUP RT_RQ: ffff88003fc13988
  RT PRIO_ARRAY: ffff88003fc13988
     [no tasks queued]
  TASK_GROUP CFS_RQ: ffff88003fc138b0
  CFS RB_ROOT: ffff88003fc138d8
     GROUP: ffff88003c610a00 CFS: ffff880037ef1b00 RB_ROOT: ffff880037ef1b28
runq: invalid structure member offset: cfs_rq_h_nr_running
      FILE: task.c  LINE: 7632  FUNCTION: print_group_header_fair()


2.6.40.4-5.fc15:

CPU 1
  CURRENT: PID: 1341  CFS: ffff880037592f00 TASK: ffff880037409730  COMMAND: "crash"
  TASK_GROUP RT_RQ: ffff88003fc92690
  RT PRIO_ARRAY: ffff88003fc92690
     [no tasks queued]
  TASK_GROUP CFS_RQ: ffff88003fc925b0
  CFS RB_ROOT: ffff88003fc925d8
     GROUP: ffff88003a84b800 CFS: ffff880037592f00 RB_ROOT: ffff880037592f28
runq: invalid structure member offset: cfs_rq_h_nr_running
      FILE: task.c  LINE: 7632  FUNCTION: print_group_header_fair()


2.6.32-131.0.15.el6:

CPU 0
  CURRENT: PID: 28263 CFS: ffff8800794b7140 TASK: ffff880037aaa040  COMMAND: "loop.ABA"
  TASK_GROUP RT_RQ: ffff88000a216098
  RT PRIO_ARRAY: ffff88000a216098
     [no tasks queued]
  TASK_GROUP CFS_RQ: ffff88000a215fe8
  CFS RB_ROOT: ffff88000a216010
     GROUP: ffff8800785bc200 CFS: ffff8800784d7c80 RB_ROOT: ffff8800784d7ca8 <A>
runq: invalid structure member offset: cfs_rq_h_nr_running
      FILE: task.c  LINE: 7632  FUNCTION: print_group_header_fair()


3.1.7-1.fc16:

CPU 2
  CURRENT: PID: 1495  CFS: ffff8800277f8500 TASK: ffff880037a60000  COMMAND: "crash"
  TASK_GROUP RT_RQ: ffff88003e2532d0
  RT PRIO_ARRAY: ffff88003e2532d0
     [no tasks queued]
  TASK_GROUP CFS_RQ: ffff88003e2531f0
  CFS RB_ROOT: ffff88003e253218
     GROUP: ffff88002722b000 CFS: ffff8800277f8500 RB_ROOT: ffff8800277f8528
runq: invalid structure member offset: cfs_rq_h_nr_running
      FILE: task.c  LINE: 7632  FUNCTION: print_group_header_fair()

So you'll need to check VALID_MEMBER(cfs_rq_h_nr_running) before attempting
to print it, and maybe just show the cfs_rq.nr_running value.

When adding new items to the offset_table, they should be added to the
end of the structure so that extension modules will continue to be able
to access any offsets as expected without having to be re-compiled.  
But you can display the new item in dump_offset_table() wherever you'd
like, typically grouped with similar/associated items -- although in your
patch you did this:

+       fprintf(fp, "              cfs_rq_nr_running: %ld\n",
+               OFFSET(cfs_rq_nr_running));
+       fprintf(fp, "              cfs_rq_h_nr_running: %ld\n",
+               OFFSET(cfs_rq_h_nr_running));

The previously-existing cfs_rq_nr_running display already exists,
so I suggest that you just move your new cfs_rq_h_nr_running display
underneath it.  (and please line up the "help -o" output of the new
item as well).

Dave






More information about the Crash-utility mailing list