[Crash-utility] CFS runqueue list handling improvements

Wed Feb 1 19:07:46 UTC 2012

I was using a crash dump last week where the CFS runqueue for one CPU
had a loop in it. It was caused by a red/black tree node that was the
left child of one node at the same time as being the right child of a
different node. The doubly-linked node has only one parent, one of the
nodes that has it as a child here). Call the node with two links to it
X, the node that is the stored parent of X call Y and the other node
that links to X call Z. The parent of Y is Z here but X is the right
node of Z (left of Y). So, you reach Z, do all it's left branch then go
right to reach X. Go up to Z from the left not Y from the right and do
all the right of Z. Go up to Y and your last node isn't the left or
right child so you go do the whole left tree of Y again, etc.

In that case, the runq command (task.c) in crash never bails out of
dump_CFS_runqueues() because the for loop that goes from rb_first()
through all of rb_next() has no means of detecting the problem, although
you can use the runq command with -d and only get the task list (which
wasn't corrupted). The runq command alone then becomes very annoying to
use when diagnosing the actual problem.

It occurs to me that this could be improved with one or more cases to
detect as additional exit conditions with warning messages for that loop:

* Number of iterations exceeds cfs_rq_nr_running (perhaps allowing it to
be exceeded by some small amount to have a chance of seeing the nature
of a problem with it, perhaps with a runq command line switch to allow
ignoring any problems)

* At the node making the cfs_rq_nr_running iteration, note the node
address, then exit with a warning if that node is ever displayed again,
thus showing any loop that starts within the first cfs_rq_nr_running nodes.

* A more complex loop detection that handles cases where
cfs_rq_nr_running is significantly lower than the number of actual nodes
in the valid part of the tree.

* I suppose if you still have the last node processed and you are going
up then that last node should always be the left or right child of the
parent you are reaching or the tree is broken.

I understand that cfs_rq_nr_running could contain the "corruption" so
assuming it indicates the correct number of times to call rb_next()
isn't correct either, so my preference is for some form of loop
detection. Before I work on a patch, are there any opinions on the
behavior of the runq command in the case of a corrupt CFS runqueue, e.g.
one that contains a loop?

-- 
David Mair
SUSE Linux