[Crash-utility] crash CPU bound waiting for user response
Dave Anderson
anderson at redhat.com
Tue Jul 10 18:25:26 UTC 2007
D. Hugh Redelmeier wrote:
> On Thu, 5 Jul 2007, Dave Anderson wrote:
>
> | From: Dave Anderson <anderson at redhat.com>
>
> | D. Hugh Redelmeier wrote:
> | > | From: Dave Anderson <anderson at redhat.com>
> | >
> | > | D. Hugh Redelmeier wrote:
> | >
> | > | > ==> Worse: while it is awaiting my RETURN, it is burning 100% of the
> | > | > CPU!
>
> | Again, what exactly do you do to reproduce it? I just cannot get the 100%
> | cpu-time waiting on the "less" sub-shell.
>
> The simplest example, in a 24-line xterm:
> $ su
> # crash
> crash> help set
>
> (I think that he su is necessary because crash is examining the live
> kernel.)
>
> This behaviour comes up whenever crash is using more for more than a page.
> Except "crash --help" which seems to be different.
Interesting, yeah, I can see it with "help set", although not necessarily
with most other help commands that use more than a page. Strange...
>
> This machine is an Athlon X2 running an up-to-date x86_64 Fedora 7.
> Mind you, only one core is enabled (because of a kernel bug that is
> the motivation for using crash).
>
> The CPU goes to 100%. I presume that most of it is in the kernel,
> handling the waitpid.
>
Right -- it's essentially becomes an alternate idle loop for that cpu...
Harmless, but annoying.
> This seems really easy for me to reproduce. What happens when you try
> this? In what environment?
>
> | Anyway, I'm going to have to be able to reproduce it and test any
> | changes thoroughly before potentially re-introducing the hangs I
> | used to see.
>
> Sure. And my suggestion was not tested even by me. It was only part
> of an argument showing that the current code is wrong.
>
> If you have a version of crash waiting for the pager to finish (as in my
> example), put a gdb on it to find out just where in crash it is
> waiting. (I've told you where mine is waiting.)
>
> I suspect that your crash would not be in a waitpid loop because that
> would be a busy wait and you say you don't see a busy wait.
I wasn't able to reproduce the 99% cpu usage (until now).
But yes, it's in the same place in restore_sanity() that
you were seeing.
Dave
More information about the Crash-utility
mailing list