[Crash-utility] crash CPU bound waiting for user response

Tue Jul 10 18:25:26 UTC 2007

D. Hugh Redelmeier wrote:
> On Thu, 5 Jul 2007, Dave Anderson wrote:
> 
> | From: Dave Anderson <anderson at redhat.com>
> 
> | D. Hugh Redelmeier wrote:
> | > | From: Dave Anderson <anderson at redhat.com>
> | > 
> | > | D. Hugh Redelmeier wrote:
> | > 
> | > | > ==> Worse: while it is awaiting my RETURN, it is burning 100% of the
> | > | > CPU!
> 
> | Again, what exactly do you do to reproduce it?  I just cannot get the 100%
> | cpu-time waiting on the "less" sub-shell.
> 
> The simplest example, in a 24-line xterm:
>     $ su
>     # crash
>     crash> help set
> 
> (I think that he su is necessary because crash is examining the live 
> kernel.)
> 
> This behaviour comes up whenever crash is using more for more than a page.  
> Except "crash --help" which seems to be different.

Interesting, yeah, I can see it with "help set", although not necessarily
with most other help commands that use more than a page.  Strange...

> 
> This machine is an Athlon X2 running an up-to-date x86_64 Fedora 7.
> Mind you, only one core is enabled (because of a kernel bug that is
> the motivation for using crash).
> 
> The CPU goes to 100%.  I presume that most of it is in the kernel,
> handling the waitpid.
> 

Right -- it's essentially becomes an alternate idle loop for that cpu...
Harmless, but annoying.

> This seems really easy for me to reproduce.  What happens when you try
> this?  In what environment?
> 
> | Anyway, I'm going to have to be able to reproduce it and test any
> | changes thoroughly before potentially re-introducing the hangs I
> | used to see.
> 
> Sure.  And my suggestion was not tested even by me.  It was only part
> of an argument showing that the current code is wrong.
> 
> If you have a version of crash waiting for the pager to finish (as in my
> example), put a gdb on it to find out just where in crash it is
> waiting.  (I've told you where mine is waiting.)
> 
> I suspect that your crash would not be in a waitpid loop because that
> would be a busy wait and you say you don't see a busy wait.

I wasn't able to reproduce the 99% cpu usage (until now).
But yes, it's in the same place in restore_sanity() that
you were seeing.

Dave