[Crash-utility] improve ps performance
panfy.fnst
panfy.fnst at cn.fujitsu.com
Sun Sep 21 08:08:02 UTC 2014
On 09/21/2014 02:57 AM, Dave Anderson wrote:
>
> ----- Original Message -----
>> On 09/20/2014 03:15 AM, Dave Anderson wrote:
>>> ----- Original Message -----
>>>> Hello Pan,
>>>>
>>>> I've updated the patch I attached yesterday with a change that
>>>> caches the most-recent tgid search result. From ~70% to ~90% of
>>>> the time, either the last tgid entry or the very next one in the
>>>> tgid_array is the one being searched for, so it's not necessary
>>>> to call bsearch() every time. "help -t" will show the cache-hit
>>>> statistics.
>>>>
>>>> Thanks,
>>>> Dave
>>> Hello Pan,
>>>
>>> This patch as written needs to be made less restrictive for use
>>> on a live system.
>>>
>>> When running on a live system that has many tasks constantly
>>> forking/exec'ing, the "ps" command may occasionally fail like so:
>>>
>>> crash> ps
>>> PID PPID CPU TASK ST %MEM VSZ RSS COMM
>>> 0 0 0 ffffffff81c13440 RU 0.0 0 0
>>> [swapper/0]
>>> 0 0 1 ffff88021282d330 RU 0.0 0 0
>>> [swapper/1]
>>> > 0 0 2 ffff88021282dac0 RU 0.0 0 0
>>> > [swapper/2]
>>> 0 0 3 ffff88021282e250 RU 0.0 0 0
>>> [swapper/3]
>>> 1 0 1 ffff880212828000 IN 0.0 50140 3120 systemd
>>> 2 0 3 ffff880212828790 IN 0.0 0 0
>>> [kthreadd]
>>> ... [ cut ] ...
>>> 7578 27670 0 ffff8801f45e3c80 DE 0.0 0 0 cc
>>> 7622 27668 1 ffff880210ee3c80 ZO 0.0 0 0 info
>>> 7629 27667 1 ffff8801075bd330 DE 0.0 0 0 rev
>>> 7631 27680 0 ffff8801075bf170 ZO 0.0 0 0 printenv
>>> 7635 27685 3 ffff880108bbe9e0 ZO 0.0 0 0 ypwhich
>>> ps: bsearch for tgid failed: task: ffff880210ee6250 tgid: 7654
>>> crash>
>>>
>>> Without this patch, the search for the matching tgid would not generate
>>> an error at all, but just quietly continue.
>>>
>>> The problem is due to the task.tgid may change on a live system, or more
>>> likely, the task itself may have been re-used.
>>>
>>> I would like to fix it simply ignoring tgid bsearch failures on live
>>> systems,
>>> and just use the RSS stats stored in the per-tgid mm_struct.
>>>
>>> Does that work for you?
>>>
>>> Dave
>>>
>>>
>>> .
>>>
>> ok!
>> But I don't understand the meaning of "
>>
>> fix it simply ignoring tgid bsearch failures on live systems,
>> and just use the RSS stats stored in the per-tgid mm_struct.
>>
>> ", if tgid may be changed, the tgid_array is useless on live systems.
> Well, in this case, it may be true for a particular task if the task struct
> had been re-used in between the time the arrays were created and the time
> that the "ps" command gets around to reading and displaying its various
> statistics. And so the command may read invalid data w/respect to that task.
>
> But let's be clear -- that kind of behavior is, and always has been, an
> unavoidable circumstance when running the crash utility on live systems, or
> when looking at a "live" dump.
>
> It's not just the "ps" command, but any command that displays data that
> is subject to the "shifting sands" syndrome, where the kernel data is
> constantly being modified while the crash command is running.
>
> So the idea is to not just cancel the whole command with an error(FATAL...)
> if such an anomoly occurs on a live system.
>
>> And what is the "RSS stats stored in the per-tgid mm_struct" used for?
> Sorry -- I meant to quietly skip the checking of the other tasks in the
> task group, and simply use whatever is stored in the mm_struct pointed to
> by the original task. Without your patch, if the tgid was not found, the
> command would just continue. With your patch applied, it would be OK
> do the error(FATAL) in the case of a static dumpfile. But in the case of
> a live system (or live dump), it's not worth killing the command at that
> point.
>
> Clear?
>
> Dave
>
>> More clearly, please.
>> thanks,
>> Pan
>>
> .
>
Oh, Ok.
Can I do it like this patch which just ignore tgid besarch failures?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20140921/78a686d4/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-ignore-tgid-bsearch-failures.patch
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20140921/78a686d4/attachment.ksh>
More information about the Crash-utility
mailing list