[Crash-utility] improve ps performance

panfy.fnst panfy.fnst at cn.fujitsu.com
Sun Sep 21 08:08:02 UTC 2014


On 09/21/2014 02:57 AM, Dave Anderson wrote:
>
> ----- Original Message -----
>> On 09/20/2014 03:15 AM, Dave Anderson wrote:
>>> ----- Original Message -----
>>>> Hello Pan,
>>>>
>>>> I've updated the patch I attached yesterday with a change that
>>>> caches the most-recent tgid search result.  From ~70% to ~90% of
>>>> the time, either the last tgid entry or the very next one in the
>>>> tgid_array is the one being searched for, so it's not necessary
>>>> to call bsearch() every time.  "help -t" will show the cache-hit
>>>> statistics.
>>>>
>>>> Thanks,
>>>>     Dave
>>> Hello Pan,
>>>
>>> This patch as written needs to be made less restrictive for use
>>> on a live system.
>>>
>>> When running on a live system that has many tasks constantly
>>> forking/exec'ing, the "ps" command may occasionally fail like so:
>>>
>>>     crash>   ps
>>>          PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
>>>           0      0   0  ffffffff81c13440  RU   0.0       0      0
>>>           [swapper/0]
>>>           0      0   1  ffff88021282d330  RU   0.0       0      0
>>>           [swapper/1]
>>>     >       0      0   2  ffff88021282dac0  RU   0.0       0      0
>>>     >       [swapper/2]
>>>           0      0   3  ffff88021282e250  RU   0.0       0      0
>>>           [swapper/3]
>>>           1      0   1  ffff880212828000  IN   0.0   50140   3120  systemd
>>>           2      0   3  ffff880212828790  IN   0.0       0      0
>>>           [kthreadd]
>>>     ... [ cut ] ...
>>>        7578  27670   0  ffff8801f45e3c80  DE   0.0       0      0  cc
>>>        7622  27668   1  ffff880210ee3c80  ZO   0.0       0      0  info
>>>        7629  27667   1  ffff8801075bd330  DE   0.0       0      0  rev
>>>        7631  27680   0  ffff8801075bf170  ZO   0.0       0      0  printenv
>>>        7635  27685   3  ffff880108bbe9e0  ZO   0.0       0      0  ypwhich
>>>     ps: bsearch for tgid failed: task: ffff880210ee6250 tgid: 7654
>>>     crash>
>>>
>>> Without this patch, the search for the matching tgid would not generate
>>> an error at all, but just quietly continue.
>>>
>>> The problem is due to the task.tgid may change on a live system, or more
>>> likely, the task itself may have been re-used.
>>>
>>> I would like to fix it simply ignoring tgid bsearch failures on live
>>> systems,
>>> and just use the RSS stats stored in the per-tgid mm_struct.
>>>
>>> Does that work for you?
>>>
>>> Dave
>>>
>>>
>>> .
>>>
>> ok!
>> But I don't understand the meaning of "
>>
>> fix it simply ignoring tgid bsearch failures on live systems,
>> and just use the RSS stats stored in the per-tgid mm_struct.
>>
>> ", if tgid may be changed, the tgid_array is useless on live systems.
> Well, in this case, it may be true for a particular task if the task struct
> had been re-used in between the time the arrays were created and the time
> that the "ps" command gets around to reading and displaying its various
> statistics.  And so the command may read invalid data w/respect to that task.
>
> But let's be clear -- that kind of behavior is, and always has been, an
> unavoidable circumstance when running the crash utility on live systems, or
> when looking at a "live" dump.
>
> It's not just the "ps" command, but any command that displays data that
> is subject to the "shifting sands" syndrome, where the kernel data is
> constantly being modified while the crash command is running.
>
> So the idea is to not just cancel the whole command with an error(FATAL...)
> if such an anomoly occurs on a live system.
>
>> And what is the "RSS stats stored in the per-tgid mm_struct" used for?
> Sorry -- I meant to quietly skip the checking of the other tasks in the
> task group, and simply use whatever is stored in the mm_struct pointed to
> by the original task.  Without your patch, if the tgid was not found, the
> command would just continue.  With your patch applied, it would be OK
> do the error(FATAL) in the case of a static dumpfile.  But in the case of
> a live system (or live dump), it's not worth killing the command at that
> point.
>
> Clear?
>
> Dave
>
>> More clearly, please.
>>     thanks,
>>        Pan
>>
> .
>
Oh, Ok.
Can I do it like this patch which just ignore tgid besarch failures?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20140921/78a686d4/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-ignore-tgid-bsearch-failures.patch
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20140921/78a686d4/attachment.ksh>


More information about the Crash-utility mailing list