Looking for job which is causing a large work load
Margaret Doll
Margaret_Doll at brown.edu
Tue Feb 16 18:56:49 UTC 2010
Thanks. I will look at atop for my systems.
On Feb 16, 2010, at 1:03 PM, Alan A wrote:
> Install atop - the best tool for tracking runaway processes/ user
> abuse/
> network utilization -etc...
>
> On Tue, Feb 16, 2010 at 10:18 AM, Stainforth, Matthew (SD/DS) <
> Matthew.Stainforth at gnb.ca> wrote:
>
>> Memory doesn't appear to be a problem. Run "free" and look at the
>> amount
>> of free memory on the "+/- buffers/cache" line.
>>
>> Top is reporting 3419 processes total with 600+ in a runnable
>> state. What
>> does "ps auwwx" tell you?
>>
>> -----Original Message-----
>> From: redhat-list-bounces at redhat.com [mailto:
>> redhat-list-bounces at redhat.com] On Behalf Of Margaret Doll
>> Sent: Tuesday, February 16, 2010 11:54 AM
>> To: General Red Hat Linux discussion list
>> Subject: Looking for job which is causing a large work load
>>
>> We have an eight processor system, running 2.6.18-128.1.6.el5xen
>> Redhat.
>>
>> We noticed the other day that sendmail was just queuing jobs and not
>> sending them.
>> mqueue, however, is empty.
>>
>> That lead us to look at the load average as a possible reason for the
>> failure of sendmail.
>> The QueueLA on sendmail is set to "8" as it should be.
>>
>> w and top show that we have a high load average and most of the
>> memory
>> on the system
>> is being used. However, no job shows up in top using a lot of
>> memory.
>>
>> top - 10:50:52 up 232 days, 15:18, 20 users, load average: 619.06,
>> 619.04, 618.98
>> Tasks: 3419 total, 1 running, 3417 sleeping, 0 stopped, 1
>> zombie
>> Cpu(s): 0.3%us, 0.9%sy, 0.0%ni, 98.8%id, 0.0%wa, 0.0%hi,
>> 0.0%si, 0.0%st
>> Mem: 16099528k total, 16063880k used, 35648k free, 487200k
>> buffers
>> Swap: 6127608k total, 105920k used, 6021688k free, 12683800k
>> cached
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 11917 user1 16 0 13424 3624 784 S 3.8 0.0 0:04.16 top
>> 11922 root 16 0 13360 3624 776 R 3.8 0.0 0:00.39 top
>> 8187 user1 16 0 13356 3620 780 S 3.5 0.0 44:48.71 top
>> 11895 user1 16 0 13452 3648 780 R 3.5 0.0 0:11.35 top
>> 1 root 15 0 10348 632 540 S 0.0 0.0 0:01.75 init
>> 2 root RT -5 0 0 0 S 0.0 0.0 0:07.51
>> migration/0
>> 3 root 34 19 0 0 0 S 0.0 0.0 0:24.56
>> ksoftirqd/0
>> 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00
>> watchdog/0
>> 5 root RT -5 0 0 0 S 0.0 0.0 0:03.77
>> migration/1
>> 6 root 34 19 0 0 0 S 0.0 0.0 0:04.96
>> ksoftirqd/1
>>
>> This machine is running long jobs from time to time and is hosting
>> large databases, so we don't want to reboot it.
>>
>> How can we find the "job" that is using all the memory and bringing
>> the work load up to such a high level? Is it the zombie that is
>> reported in top?
>>
>>
>> Thanks
>>
>> w
>> 10:57:27 up 232 days, 15:25, 18 users, load average: 619.19,
>> 619.28, 619.13
>> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
>> user1 pts/2 lfps 15Jan10 4days 0.10s 0.10s -tcsh
>> user1 pts/3 lfps Thu16 17:45m 44:55 44:54 top
>> user1 pts/4 lfps 15Jan10 25days 0.10s 0.10s -tcsh
>> user2 pts/5 gc166-mm.geo.bro Thu16 4days 0.02s 0.01s
>> sshd: user2 [priv]
>> crism pts/8 molybdenum Fri13 3days 1:27 1:27 /usr/
>> local/itt/idl70/bin/bin.linux.x8
>> root pts/9 :0.0 23Oct09 116days 0.00s 0.00s
>> ssh -
>> l user1 moly
>> wjuser1 pts/10 porter2.geo.brow Mon10 6:01 0.11s 0.11s -tcsh
>> user2 pts/12 gc166-mm.geo.bro Fri14 0.00s 0.07s 0.00s
>> sshd: user2 [priv]
>> root :0 - 23Oct09 ?xdm? 2:24m 0.03s /usr/
>> bin/gnome-session
>> user1 pts/16 lfps Mon14 3:47 10.30s 10.24s top
>> user1 pts/14 quahog2.geo.brow Mon15 8:22 17.54s 17.48s top
>> root pts/15 :0.0 23Oct09 116days 0.01s 0.01s -
>> bin/
>> tcsh
>> user1 pts/17 quahog2.geo.brow Mon14 18:19m 0.11s 0.11s -tcsh
>> root pts/23 :0.0 23Oct09 116days 0.01s 0.01s -
>> bin/
>> tcsh
>> root pts/24 :0.0 23Oct09 116days 0.01s 0.01s -
>> bin/
>> tcsh
>> user1 pts/28 lfps 15Jan10 4:08 0.12s 0.12s -tcsh
>> user1 pts/30 lfps 15Jan10 6:01 0.39s 0.00s sshd:
>> user1 [priv]
>> root pts/7 :0.0 23Oct09 116days 5.78s 0.00s -
>> bin/
>> tcsh
>>
>>
>> --
>> redhat-list mailing list
>> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
>> https://www.redhat.com/mailman/listinfo/redhat-list
>>
>> --
>> redhat-list mailing list
>> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
>> https://www.redhat.com/mailman/listinfo/redhat-list
>>
>
>
>
> --
> Alan A.
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
More information about the redhat-list
mailing list