RHEL AS 4 U2 Slow

Wed Feb 8 02:28:23 UTC 2006

Rick Stevens wrote:

>On Tue, 2006-02-07 at 17:29 -0500, Brenda Radford wrote:
>  
>
>>Rick Stevens wrote:
>>    
>>
>
>I'm cleaning up the message a bit...there's a lot of cruft we don't
>need to deal with anymore.
>
>  
>
>>>There's nothing obvious there.  What you really need to do is run
>>>"top" and look at the top few processes listed there (you can usually
>>>ignore the "init", "top" and "X" processes) and see what's sucking up
>>>the CPU time.  Watch the "%CPU" and %MEM" columns and find the process
>>>that's got the highest "%CPU" bit.  That's the one we need to look at.
>>>
>>>Also pay attention to the bit that looks like this:
>>>
>>>Cpu(s):  4.6% us,  0.0% sy,  0.0% ni, 94.4% id,  0.0% wa,  1.0% hi,
>>>0.0% si
>>>
>>>as it shows a summary of where the CPU is spending its time:
>>>
>>>"us" = user state
>>>"sy" = system state
>>>"ni" = non-interruptible sleep
>>>"id" = idle
>>>"wa" = I/O wait state
>>>"hi" = hardware interrupts
>>>"si" = software interrupts
>>>
>>>Even if you don't see a process sucking up a lot of CPU, but you see
>>>the CPU spending a lot of time in the "wa" state, then you have a disk
>>>problem.  Look in the process list for processes in the "D" state.
>>>
>>>
>>> 
>>>
>>>      
>>>
>>Rick,
>>
>> From top, on two different days:
>>
>>[brenda at localhost ~]$ top
>>
>>top - 20:40:49 up 11 min,  2 users,  load average: 0.03, 0.05, 0.06
>>Tasks:  83 total,   1 running,  82 sleeping,   0 stopped,   0 zombie
>>Cpu(s):  2.3% us,  0.0% sy,  0.0% ni, 97.7% id,  0.0% wa,  0.0% hi,  0.0% si
>>Mem:    905760k total,   306536k used,   599224k free,    18996k buffers
>>Swap:  1799232k total,        0k used,  1799232k free,   180724k cached
>>
>>[brenda at localhost ~]$ top
>>
>>top - 15:31:16 up 35 min,  4 users,  load average: 0.24, 0.07, 0.02
>>Tasks:  89 total,   1 running,  87 sleeping,   0 stopped,   1 zombie
>>Cpu(s):  1.3% us,  0.0% sy,  0.0% ni, 98.7% id,  0.0% wa,  0.0% hi,  0.0% si
>>Mem:    905760k total,   396168k used,   509592k free,    23224k buffers
>>Swap:  1799232k total,        0k used,  1799232k free,   244368k cached
>>
>>The only time the CPU showed any activity in the I/O wait state was when 
>>top was first started,
>>at the 1-4% level, and only for an instant.  It immediately went back to 
>>0.0%.  The only other
>>processes that showed up at the top of the list (besides those you 
>>mentioned) were gnome-terminal,
>>hald, and rhn-applet-gui, but they only used tiny amounts of CPU and 
>>MEM, even with 4 or 5
>>terminal windows open (hence the 4 users). 
>>
>>The more times I ran top, the more Memory it reported used (in the top 
>>header).  It went up to more than
>>4XX,XXX used before I was finished, after running  top and ps ax 
>>numerous times.
>>    
>>
>
>That's normal and nothing to worry about.  Start worrying if you see
>the "Swap: 0k used," thing start to go non-zero.  That means you're out
>of memory and the system has to swap things from memory to disk and back
>to run them.  That slows the machine down a LOT.
>
>  
>
>>I didn't have any processes in the "D" state.
>>    
>>
>
>Ok, so we don't seem to have an I/O wait state issue.  I did notice a
>zombie process there in the second top report.  I'm curious as to which
>process that is and what its parent is.  Try doing a "ps ax" and find
>the process that's in a "Z" state.
>
>  
>
>>Any ideas?
>>    
>>
>
>Well, it doesn't seem to be process or memory related.  It could be
>context switching issues.  Try doing a "vmstat 5" for, say a minute,
>then CTRL-C to get out of it.  Look at the "cs" column towards the
>right.  If that gets to 5 digits, we have something that's causing
>context switch problems and there's a bit more investigation we need
>to do.
>
>Also look at the output of "dmesg" for clues.
>
>  
>
Rick,

The zombie was a "[netstat} <defunct>".  I don't know where that came 
from, but it eventually went away.
I did a kill PID on it. (I have forgotten the best way to get rid of 
zombies.)

The "vmstat 5" cpu cs value was 281, then 60-70 unless I moved the 
mouse, opened another terminal window
(397), or piped the vmstat man page to the printer, but the highest it 
got then was 4685.  Nowhere near 5 digits. 

 From dmesg in the GUI, I have 3 errors besides the error you get when 
you don't have a CD in the CD-ROM
(open failed).  With a few lines for context for the first one (I have 
SELinux disabled):

> Security Scaffold v1.0.0 initialized
> SELinux:  Initializing.
> SELinux:  Starting in permissive mode
> There is already a security framework initialized, register_security 
> failed.       <<<==This is the error
> selinux_register_security:  Registering secondary module capability
> Capability LSM initialized as secondary

And the two errors I told you about before:

> shpchp: acpi_shpchprm:\_SB_.PCI0 evaluate _BBN fail=0x5
> shpchp: acpi_shpchprm:get_device PCI ROOT HID fail=0x5

And not an error, but the line from grub.conf we edited is in there:

> Kernel command line: ro root=LABEL=/1 rhgb quiet noacpi

If you want to see all of /var/log/dmesg, just say so. I'll post the 
whole thing.

RHEL AS 3 didn't run slow like this on this machine.   I upgraded to AS 
4 because that is what
we're using at school in the Red Hat Academy.  I would ask Red Hat why 
it is so slow, but you
don't get any support with an academic subscription.  The machines at 
school are running ES 4,
also with SELinux disabled, and they aren't slow like mine at home.

When I told my instructor my machine at home was slow, he asked me how 
fast was my CPU and
how much memory I had.   I told you all that when I first posted.  Then 
I told him it improved,
(which didn't last), and we haven't talked about it again.

Thanks for your time and efforts,

Brenda