Looking for job which is causing a large work load
Alan A
alan.zg at gmail.com
Tue Feb 16 18:03:33 UTC 2010
Install atop - the best tool for tracking runaway processes/ user abuse/
network utilization -etc...
On Tue, Feb 16, 2010 at 10:18 AM, Stainforth, Matthew (SD/DS) <
Matthew.Stainforth at gnb.ca> wrote:
> Memory doesn't appear to be a problem. Run "free" and look at the amount
> of free memory on the "+/- buffers/cache" line.
>
> Top is reporting 3419 processes total with 600+ in a runnable state. What
> does "ps auwwx" tell you?
>
> -----Original Message-----
> From: redhat-list-bounces at redhat.com [mailto:
> redhat-list-bounces at redhat.com] On Behalf Of Margaret Doll
> Sent: Tuesday, February 16, 2010 11:54 AM
> To: General Red Hat Linux discussion list
> Subject: Looking for job which is causing a large work load
>
> We have an eight processor system, running 2.6.18-128.1.6.el5xen
> Redhat.
>
> We noticed the other day that sendmail was just queuing jobs and not
> sending them.
> mqueue, however, is empty.
>
> That lead us to look at the load average as a possible reason for the
> failure of sendmail.
> The QueueLA on sendmail is set to "8" as it should be.
>
> w and top show that we have a high load average and most of the memory
> on the system
> is being used. However, no job shows up in top using a lot of memory.
>
> top - 10:50:52 up 232 days, 15:18, 20 users, load average: 619.06,
> 619.04, 618.98
> Tasks: 3419 total, 1 running, 3417 sleeping, 0 stopped, 1 zombie
> Cpu(s): 0.3%us, 0.9%sy, 0.0%ni, 98.8%id, 0.0%wa, 0.0%hi,
> 0.0%si, 0.0%st
> Mem: 16099528k total, 16063880k used, 35648k free, 487200k buffers
> Swap: 6127608k total, 105920k used, 6021688k free, 12683800k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 11917 user1 16 0 13424 3624 784 S 3.8 0.0 0:04.16 top
> 11922 root 16 0 13360 3624 776 R 3.8 0.0 0:00.39 top
> 8187 user1 16 0 13356 3620 780 S 3.5 0.0 44:48.71 top
> 11895 user1 16 0 13452 3648 780 R 3.5 0.0 0:11.35 top
> 1 root 15 0 10348 632 540 S 0.0 0.0 0:01.75 init
> 2 root RT -5 0 0 0 S 0.0 0.0 0:07.51
> migration/0
> 3 root 34 19 0 0 0 S 0.0 0.0 0:24.56
> ksoftirqd/0
> 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
> 5 root RT -5 0 0 0 S 0.0 0.0 0:03.77
> migration/1
> 6 root 34 19 0 0 0 S 0.0 0.0 0:04.96
> ksoftirqd/1
>
> This machine is running long jobs from time to time and is hosting
> large databases, so we don't want to reboot it.
>
> How can we find the "job" that is using all the memory and bringing
> the work load up to such a high level? Is it the zombie that is
> reported in top?
>
>
> Thanks
>
> w
> 10:57:27 up 232 days, 15:25, 18 users, load average: 619.19,
> 619.28, 619.13
> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
> user1 pts/2 lfps 15Jan10 4days 0.10s 0.10s -tcsh
> user1 pts/3 lfps Thu16 17:45m 44:55 44:54 top
> user1 pts/4 lfps 15Jan10 25days 0.10s 0.10s -tcsh
> user2 pts/5 gc166-mm.geo.bro Thu16 4days 0.02s 0.01s
> sshd: user2 [priv]
> crism pts/8 molybdenum Fri13 3days 1:27 1:27 /usr/
> local/itt/idl70/bin/bin.linux.x8
> root pts/9 :0.0 23Oct09 116days 0.00s 0.00s ssh -
> l user1 moly
> wjuser1 pts/10 porter2.geo.brow Mon10 6:01 0.11s 0.11s -tcsh
> user2 pts/12 gc166-mm.geo.bro Fri14 0.00s 0.07s 0.00s
> sshd: user2 [priv]
> root :0 - 23Oct09 ?xdm? 2:24m 0.03s /usr/
> bin/gnome-session
> user1 pts/16 lfps Mon14 3:47 10.30s 10.24s top
> user1 pts/14 quahog2.geo.brow Mon15 8:22 17.54s 17.48s top
> root pts/15 :0.0 23Oct09 116days 0.01s 0.01s -bin/
> tcsh
> user1 pts/17 quahog2.geo.brow Mon14 18:19m 0.11s 0.11s -tcsh
> root pts/23 :0.0 23Oct09 116days 0.01s 0.01s -bin/
> tcsh
> root pts/24 :0.0 23Oct09 116days 0.01s 0.01s -bin/
> tcsh
> user1 pts/28 lfps 15Jan10 4:08 0.12s 0.12s -tcsh
> user1 pts/30 lfps 15Jan10 6:01 0.39s 0.00s sshd:
> user1 [priv]
> root pts/7 :0.0 23Oct09 116days 5.78s 0.00s -bin/
> tcsh
>
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>
--
Alan A.
More information about the redhat-list
mailing list