RHELv4 and v5 - So slow as to be unusable.
Mohammad Zakaria
myz_sa at yahoo.com
Wed Oct 13 12:44:22 UTC 2010
Hello Grey,
Are those machines having the same brand or HW??
and if you reconnect to your NTP server does the clock start counting right or at the same 7.8 rate??
try to reset one of the machines BIOS to its defaults and check the results??
have you tried to disable one of your processors cores and work with a single processor??
--- On Sat, 10/9/10, Mohammad Zakaria <myz_sa at yahoo.com> wrote:
From: Mohammad Zakaria <myz_sa at yahoo.com>
Subject: Re: RHELv4 and v5 - So slow as to be unusable.
To: "General Red Hat Linux discussion list" <redhat-list at redhat.com>
Date: Saturday, October 9, 2010, 12:58 PM
If you have one piece of RAM try to replace it and check your box status, or if
it is a combination of 2 sets, try the system performance with each RAM
separately, if there is any problem with your RAM HW you should detect that
easily and fix it.
________________________________
From: Gary E Barnes <gebarnes at us.ibm.com>
To: redhat-list at redhat.com
Sent: Thu, October 7, 2010 9:17:01 PM
Subject: Re: RHELv4 and v5 - So slow as to be unusable.
> From: Laszlo Beres <laszlo at beres.me>
> Subject: Re: RHELv4 and v5 - So slow as to be unusable.
>
> On Wed, Oct 6, 2010 at 9:22 PM, Gary E Barnes <gebarnes at us.ibm.com>
wrote:
>
> > "top" says that nothing is going on although the load average is 3+.
> > "sar" also says that nothing is going on.
>
> There's no such thing "nothing is going on". You should see CPU
> status, process status, etc. vmstat also can give you some hints about
> the system health.
Oh but there is such a thing. I have one of the machines in this weird
slowdown state right at this moment. It started around 4:45pm yesterday,
after running perfectly for about 3 hours 15 minutes, and I left it
overnight to see if maybe it would get "over it" by itself. Hasn't
happened though.
Here is the very first header from the "top" display of a top I started
just for this example.
top - 19:18:20 up 4:33, 4 users, load average: 3.56, 3.58, 3.54
Tasks: 159 total, 16 running, 143 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.3% us, 0.4% sy, 2.9% ni, 95.1% id, 0.3% wa, 0.0% hi, 0.0%
si
Mem: 2586400k total, 1880032k used, 706368k free, 193036k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1220324k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5813 root 16 0 164m 31m 6124 R 0.2 1.3 25:05.54 X
6650 geb 16 0 4048 2116 1332 S 0.2 0.1 0:00.86 xalarm
27229 geb2 16 0 3008 960 696 R 0.2 0.0 0:00.01 top
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
And here is the first refresh of that display (I'm capturing this in an
Emacs buffer if you're curious).
Tasks: 161 total, 3 running, 158 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.9% us, 0.5% sy, 0.1% ni, 98.5% id, 0.0% wa, 0.0% hi, 0.0%
si
Mem: 2586400k total, 1885680k used, 700720k free, 193036k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1220584k cached
Here is the second, notice the 100% idle value.
Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0%
si
Mem: 2586400k total, 1885696k used, 700704k free, 193044k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1220576k cached
There is memory available. There is swap available.
Idle occasionally drops to 99.9%.
Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 0.1% sy, 0.0% ni, 99.8% id, 0.0% wa, 0.0% hi, 0.0%
si
Mem: 2586400k total, 1885464k used, 700936k free, 193076k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1220544k cached
The processes that show up in the first line or two of top are things such
as:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27231 geb 16 0 28316 11m 8412 S 9.8 0.5 0:00.41
gnome-terminal
5813 root 16 0 165m 31m 6316 S 3.8 1.3 25:05.70 X
6697 geb 16 0 22280 10m 7644 S 1.9 0.4 0:18.69 wnck-applet
3817 rpc 15 0 2336 592 484 S 0.2 0.0 0:01.60 portmap
6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.08 gam_server
6650 geb 16 0 4048 2116 1332 S 0.2 0.1 0:00.87 xalarm
3805 root 16 0 2492 312 220 S 0.2 0.0 0:00.55 irqbalance
4867 root 16 0 2800 844 624 D 0.2 0.0 0:03.32 rpc.mountd
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
6451 geb 16 0 12764 7512 1688 S 0.1 0.3 0:01.66 gconfd-2
27229 geb2 16 0 3012 1044 772 R 0.1 0.0 0:00.05 top
27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.07 top
4996 root 16 0 4772 3104 1536 S 0.2 0.1 0:02.47 hald
6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.09 gam_server
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.09 top
23574 geb 16 0 145m 68m 26m S 0.2 2.7 2:44.86 firefox-bin
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
As you can see, there is essentially "nothing going on".
An yet the machine is very unresponsive. If I run a command that hasn't
been run in a while (don't know the time frame, but it seems to be only
minutes) then the command takes >30 seconds to execute. For example, I
just did the "date" command and when it finally responded I did the
hwclock command. Both took >30 seconds to run. Now if I repeat those
commands they execute immediately. I'm presuming that this is due to
executable file caching in the operating system. If I wait a while then
the >30 second wait will reappear for those same commands. Presumably
they've left that cache.
This behavior is observable both in xterm's on the console and also
through ssh connections from another machine.
Programs that are already loaded and running seem to be pretty much ok, at
least until they need to go read some new file or write some new file,
then they hang for a while and eventually get going again.
If I run sar (sysstat package) I get essentially the same picture. From a
"sar -A 30 4" here are the averages for two minutes. Load average 3+ and
>99% idle. Nearly no I/O of any sort; not 0 but very low amounts for two
minutes.
Average: proc/s
Average: 0.12
Average: cswch/s
Average: 48.90
Average: CPU %user %nice %system %iowait %idle
Average: all 0.29 0.01 0.12 0.07 99.51
Average: 0 0.25 0.01 0.13 0.00 99.60
Average: 1 0.33 0.01 0.11 0.13 99.42
Average: INTR intr/s
Average: sum 11.87
Average: pgpgin/s pgpgout/s fault/s majflt/s
Average: 0.01 8.55 34.76 0.00
Average: pswpin/s pswpout/s
Average: 0.00 0.00
Average: tps rtps wtps bread/s bwrtn/s
Average: 1.35 0.00 1.35 0.02 17.11
Average: frmpg/s bufpg/s campg/s
Average: -0.73 0.02 -0.02
Average: CPU i000/s i001/s i008/s i009/s i012/s i014/s i015/s
i177/s i185/s i193/s i201/s i209/s
Average: 0 0.02 0.00 0.00 0.00 0.00 0.00 0.09
7.76 0.00 0.00 0.00 0.00
Average: 1 0.01 2.65 0.00 0.00 0.00 1.35 0.00
0.00 0.00 0.00 0.00 0.00
Average: IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s
txcmp/s rxmcst/s
Average: lo 0.00 0.00 0.12 0.12 0.00
0.00 0.00
Average: eth0 3.35 2.25 364.83 196.51 0.00
0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00
0.00 0.00
Average: IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s
txcarr/s rxfram/s rxfifo/s txfifo/s
Average: lo 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
Average: eth0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
Average: DEV tps rd_sec/s wr_sec/s
Average: dev1-0 0.00 0.00 0.00
Average: dev1-1 0.00 0.00 0.00
Average: dev1-2 0.00 0.00 0.00
Average: dev1-3 0.00 0.00 0.00
Average: dev1-4 0.00 0.00 0.00
Average: dev1-5 0.00 0.00 0.00
Average: dev1-6 0.00 0.00 0.00
Average: dev1-7 0.00 0.00 0.00
Average: dev1-8 0.00 0.00 0.00
Average: dev1-9 0.00 0.00 0.00
Average: dev1-10 0.00 0.00 0.00
Average: dev1-11 0.00 0.00 0.00
Average: dev1-12 0.00 0.00 0.00
Average: dev1-13 0.00 0.00 0.00
Average: dev1-14 0.00 0.00 0.00
Average: dev1-15 0.00 0.00 0.00
Average: dev3-0 1.35 0.02 17.11
Average: dev3-1 0.00 0.00 0.00
Average: dev3-2 0.13 0.02 3.35
Average: dev3-3 0.00 0.00 0.00
Average: dev3-4 0.00 0.00 0.00
Average: dev3-5 1.22 0.01 13.75
Average: dev22-64 0.00 0.00 0.00
Average: dev22-65 0.00 0.00 0.00
Average: dev22-0 0.00 0.00 0.00
Average: dev2-0 0.00 0.00 0.00
Average: dev9-0 0.00 0.00 0.00
Average: kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree
kbswpused %swpused kbswpcad
Average: 696608 1889792 73.07 193202 1220938 4192956
0 0.00 0
Average: dentunusd file-sz inode-sz super-sz %super-sz dquot-sz
%dquot-sz rtsig-sz %rtsig-sz
Average: 216815 3285 185128 0 0.00 0
0.00 0 0.00
Average: totsck tcpsck udpsck rawsck ip-frag
Average: 343 56 8 0 0
Average: runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
Average: 14 186 3.35 3.61 3.59
The machine entered this state at about 4:45pm yesterday afternoon. It is
now 12:00 noon the next day.
The "date" command says that the system thinks that the time is 7:26PM
yesterday.
In the last 47 minutes the system clock has gained only 6 minutes. A rate
of somewhere around 7.8.
Another interesting little symptom, when this slowdown is in effect the
keyboard autorepeat on keys stops working.
If this was the only machine doing this I'd think it was a hardware
problem. But (a) it isn't the only machine and (b) while it seems to
always happen to these machines, it is only after running for at least a
few hours without problems.
Gary
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
More information about the redhat-list
mailing list