Weird high load on server
nodata
fedora at nodata.co.uk
Tue Jul 19 15:15:49 UTC 2005
> On 7/19/05, nodata <fedora at nodata.co.uk> wrote:
>> > On 7/19/05, nodata <fedora at nodata.co.uk> wrote:
>> >> > Hi Guys,
>> >> >
>> >> > Hope you experts can help me out here.
>> >> >
>> >> > Basically I have server running at a very high load 2.44, although
>> >> > nothing is noticably high when using top. There aren't any
>> processes
>> >> > running on the box except the standard linux OS tools. This box is
>> >> > used for backup, and only becomes active during the night.
>> >> >
>> >> > Its a compaq dl380 with a raid 5 configuration.
>> >> >
>> >> > Can anyone suggest what I can do to find out why the load is high?
>> >> >
>> >> > Thanks for your help in advance.
>> >> >
>> >> > Dan
>> >> >
>> >>
>> >> I bet you have hanging nfs mounts.
>> >> If the box is constantly at a load of around 2.44, and isn't
>> sluggish, I
>> >> wouldn't worry.
>> >>
>> >> Look at iostat, sar, etc. to find out why the load is like that.
>> >>
>> >
>> >
>> > Hi
>> >
>> > I've looked at these but can't see anything. The server doesn't mount
>> > or export any filesystems using nfs or any other protocol. If it helps
>> > here are the various outputs:
>> >
>> > uptime
>> > 14:45:49 up 62 days, 43 min, 2 users, load average: 1.46, 1.57,
>> 1.59
>> >
>> > sar 5 10
>> > Linux 2.4.21-27.0.4.ELsmp (orion.gs.moneyextra.com) 19/07/05
>> >
>> > 14:46:02 CPU %user %nice %system %idle
>> > 14:46:07 all 0.00 0.00 0.00 100.00
>> > 14:46:12 all 0.00 0.00 0.10 99.90
>> > 14:46:17 all 0.00 0.00 0.10 99.90
>> > 14:46:22 all 0.00 0.00 0.00 100.00
>> > 14:46:27 all 0.00 0.00 0.00 100.00
>> > 14:46:32 all 0.00 0.00 0.10 99.90
>> > 14:46:37 all 0.00 0.00 0.00 100.00
>> > 14:46:42 all 0.10 0.00 0.31 99.59
>> > 14:46:47 all 0.00 0.00 0.00 100.00
>> > 14:46:52 all 0.00 0.00 0.00 100.00
>> > Average: all 0.01 0.00 0.06 99.93
>> >
>> > vmstat -a
>> > procs memory swap io system
>> > cpu
>> > r b swpd free inact active si so bi bo in cs us
>> sy
>> > wa id
>> > 0 0 0 15404 189668 202836 0 0 3 1 0 2 3
>> 4
>> > 1 3
>> >
>> > free -m
>> > total used free shared buffers
>> cached
>> > Mem: 498 483 15 0 128
>> 301
>> > -/+ buffers/cache: 53 445
>> > Swap: 1027 0 1027
>> >
>> > iostat
>> > Linux 2.4.21-27.0.4.ELsmp (orion.gs.moneyextra.com) 19/07/05
>> >
>> > avg-cpu: %user %nice %sys %idle
>> > 3.11 0.00 3.72 93.17
>> >
>> > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>> > /dev/ida/c0d0 19.68 427.93 279.15 2147483647 1400883506
>> > /dev/ida/c0d0p1
>> > 0.00 0.22 0.00 1087144 8986
>> > /dev/ida/c0d0p2
>> > 0.65 3.72 10.24 18680778 51401528
>> > /dev/ida/c0d0p3
>> > 0.00 0.00 0.00 248 0
>> > /dev/ida/c0d0p4
>> > 0.00 0.00 0.00 0 0
>> > /dev/ida/c0d0p5
>> > 0.74 3.90 6.88 19570498 34517568
>> > /dev/ida/c0d0p6
>> > 0.00 0.00 0.00 168 0
>> > /dev/ida/c0d0p7
>> > 0.00 0.00 0.00 168 0
>> > /dev/ida/c0d0p8
>> > 18.29 427.93 262.03 2147483647 1314955424
>> >
>> > top
>> > 14:47:51 up 62 days, 45 min, 2 users, load average: 1.73, 1.61,
>> 1.59
>> > 61 processes: 60 sleeping, 1 running, 0 zombie, 0 stopped
>> > CPU states: cpu user nice system irq softirq iowait
>> idle
>> > total 0.4% 0.0% 0.0% 0.0% 0.0% 0.0%
>> 99.5%
>> > cpu00 0.9% 0.0% 0.0% 0.0% 0.0% 0.0%
>> 99.0%
>> > cpu01 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
>> 100.0%
>> > Mem: 510400k av, 495224k used, 15176k free, 0k shrd,
>> 132000k
>> > buff
>> > 203040k actv, 182824k in_d, 6852k in_c
>> > Swap: 1052592k av, 0k used, 1052592k free
>> 308668k
>> > cached
>> >
>> > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU
>> COMMAND
>> > 13100 root 20 0 1092 1092 888 R 0.4 0.2 0:00 0 top
>> > 1 root 15 0 512 512 452 S 0.0 0.1 1:18 0 init
>> > 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0
>> > migration/0
>> > 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1
>> > migration/1
>> > 4 root 15 0 0 0 0 SW 0.0 0.0 0:00 1
>> keventd
>> > 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0
>> > ksoftirqd/0
>> > 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1
>> > ksoftirqd/1
>> > 9 root 15 0 0 0 0 SW 0.0 0.0 0:00 0
>> bdflush
>> > 7 root 15 0 0 0 0 SW 0.0 0.0 70:21 0
>> kswapd
>> > 8 root 15 0 0 0 0 SW 0.0 0.0 23:07 1
>> kscand
>> > 10 root 15 0 0 0 0 SW 0.0 0.0 3:30 0
>> kupdated
>> > 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0
>> > mdrecoveryd
>> > 18 root 15 0 0 0 0 SW 0.0 0.0 0:00 0
>> ahc_dv_0
>> > 19 root 25 0 0 0 0 SW 0.0 0.0 0:00 0
>> > scsi_eh_0
>> > 23 root 15 0 0 0 0 SW 0.0 0.0 2:30 1
>> > kjournald
>> > 192 root 15 0 0 0 0 SW 0.0 0.0 0:00 0
>> > kjournald
>> > 193 root 15 0 0 0 0 SW 0.0 0.0 13:57 1
>> > kjournald
>> > 194 root 15 0 0 0 0 SW 0.0 0.0 4:18 0
>> > kjournald
>> > 568 root 15 0 576 576 492 S 0.0 0.1 0:57 0
>> syslogd
>> > 572 root 15 0 472 472 408 S 0.0 0.0 0:00 1
>> klogd
>> > 582 root 15 0 452 452 388 S 0.0 0.0 5:33 1
>> > irqbalance
>> > 599 rpc 15 0 600 600 524 S 0.0 0.1 0:22 0
>> portmap
>> > 618 rpcuser 25 0 720 720 644 S 0.0 0.1 0:00 0
>> > rpc.statd
>> > 629 root 15 0 400 400 344 S 0.0 0.0 0:18 0
>> mdadm
>> > 712 root 15 0 3160 3160 2024 S 0.0 0.6 3:22 1
>> snmpd
>> > 713 root 25 0 3160 3160 2024 S 0.0 0.6 0:00 0
>> snmpd
>> > 722 root 15 0 1576 1576 1324 S 0.0 0.3 4:58 1 sshd
>> >
>> > Anyone have any ideas. Literally the box is sitting there not doing
>> > anything that has been scheduled.
>> >
>> > This happens occassionally then the load spontaneously goes down. Do
>> > you reckon it has something to do with the raid 5?
>> >
>> > Thanks
>> > Dan
>> >
>>
>> ps auxw | grep " D "
>>
> Hi,
>
> I get the following:
>
> ps auxw | grep " D "
> root 15802 0.0 0.1 3688 660 pts/0 S 16:06 0:00 grep D
>
> Dan
>
Then it's probably not a problem of waiting for IO.
Here are the other codes, you might want to try S or T:
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers
(header "STAT" or "S") will display to describe the state of a process.
D Uninterruptible sleep (usually IO)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reaped by its parent.
More information about the fedora-list
mailing list