[Crash-utility] CFS runqueue list handling improvements

Thu Feb 2 04:19:20 UTC 2012

On 02/01/2012 06:41 PM, HATAYAMA Daisuke wrote:
> Hello,
>
> From: David Mair<dmair at suse.com>
> Subject: [Crash-utility] CFS runqueue list handling improvements
> Date: Wed, 01 Feb 2012 12:07:46 -0700
>
>> I was using a crash dump last week where the CFS runqueue for one CPU
>> had a loop in it. It was caused by a red/black tree node that was the
>> left child of one node at the same time as being the right child of a
>> different node. The doubly-linked node has only one parent, one of the
>> nodes that has it as a child here). Call the node with two links to it
>> X, the node that is the stored parent of X call Y and the other node
>> that links to X call Z. The parent of Y is Z here but X is the right
>> node of Z (left of Y). So, you reach Z, do all it's left branch then go
>> right to reach X. Go up to Z from the left not Y from the right and do
>> all the right of Z. Go up to Y and your last node isn't the left or
>> right child so you go do the whole left tree of Y again, etc.
>
> So, I understand this as the following diagram, without explicit
> parent notations:
>
>              :
>              :
>              |
>              V     parent
>              Z<----------  Y
>              |              ^ |
>     rb_right |       parent | | left
>              V              | |
>              X -------------+ |
>                <--------------+
>
> , and the loop continues just like:
>
>              ->  Z ->  X ->  Y
>                 ^         |
>                 |         |
>                 +---------+
>
> Is this right?

Not completely, it's a red-black tree I see it in and X is the top of 
Z's right branch so between Z and X is all of Z's left branch and 
between X and Y is all of X's left and right branches. But the limits 
are as-shown with more between in the case I see.

> One concern I have is when this kind of situation could
> happen. Propbably only unde rtree operation?

I suspect any multi-layer tree that permits width greater than one at 
any layer and that assumes a single parent per-layer and doesn't verify 
that structure element could be corrupted in this way. From one 
perspective here, two nodes at the same layer have different parents 
such that you can bypass the end-detection of the true parent layer via 
its own child layer. The missing error handling is ultimately that each 
layer doesn't check that all its children have it as a parent (a useful 
two-step last act of rb_next() for example that would also detect going 
off into space?). Using hq_enter() with each node detects the duplicate 
from the embedded loop here though.

> Thanks.
> HATAYAMA, Daisuke

-- 
David Mair
SUSE Linux

>>
>> In that case, the runq command (task.c) in crash never bails out of
>> dump_CFS_runqueues() because the for loop that goes from rb_first()
>> through all of rb_next() has no means of detecting the problem, although
>> you can use the runq command with -d and only get the task list (which
>> wasn't corrupted). The runq command alone then becomes very annoying to
>> use when diagnosing the actual problem.
>>
>> It occurs to me that this could be improved with one or more cases to
>> detect as additional exit conditions with warning messages for that loop:
>>
>> * Number of iterations exceeds cfs_rq_nr_running (perhaps allowing it to
>> be exceeded by some small amount to have a chance of seeing the nature
>> of a problem with it, perhaps with a runq command line switch to allow
>> ignoring any problems)
>>
>> * At the node making the cfs_rq_nr_running iteration, note the node
>> address, then exit with a warning if that node is ever displayed again,
>> thus showing any loop that starts within the first cfs_rq_nr_running nodes.
>>
>> * A more complex loop detection that handles cases where
>> cfs_rq_nr_running is significantly lower than the number of actual nodes
>> in the valid part of the tree.
>>
>> * I suppose if you still have the last node processed and you are going
>> up then that last node should always be the left or right child of the
>> parent you are reaching or the tree is broken.
>>
>> I understand that cfs_rq_nr_running could contain the "corruption" so
>> assuming it indicates the correct number of times to call rb_next()
>> isn't correct either, so my preference is for some form of loop
>> detection. Before I work on a patch, are there any opinions on the
>> behavior of the runq command in the case of a corrupt CFS runqueue, e.g.
>> one that contains a loop?
>>
>> --
>> David Mair
>> SUSE Linux
>>
>> --
>> Crash-utility mailing list
>> Crash-utility at redhat.com
>> https://www.redhat.com/mailman/listinfo/crash-utility
>>
>
>