head node has an extremely high load average.
Jonathan Billings
jsbillin at umich.edu
Wed Jun 26 19:35:53 UTC 2013
Hello,
Is your head node an NFS server, and are the jobs writing to the NFS share?
On Wed, Jun 26, 2013 at 3:27 PM, Doll, Margaret Ann <margaret_doll at brown.edu
> wrote:
> I have a computer cluster Running rocks 5.2, Centos 6.
>
> The head node is over loaded. There are 2 CPUs on the head node.
>
> top - 14:27:49 up 1 day, 6:11, 6 users, load average: 13.65, 14.12,
> 13.92
> Tasks: 168 total, 3 running, 163 sleeping, 0 stopped, 2 zombie
> Cpu(s): 1.2%us, 1.9%sy, 0.0%ni, 0.0%id, 91.7%wa, 1.0%hi, 4.1%si,
> 0.0%st
> Mem: 2053088k total, 2001464k used, 51624k free, 74476k buffers
> Swap: 1020116k total, 388k used, 1019728k free, 1638076k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
>
> 2515 nobody 15 0 218m 3176 1048 S 2.3 0.2 8:46.23
> gmetad
> 2967 root 15 0 0 0 0 S 2.0 0.0 0:20.31
> nfsd
> 2970 root 15 0 0 0 0 R 1.0 0.0 0:20.60
> nfsd
> 3110 nobody 15 0 198m 20m 3360 S 0.3 1.0 4:22.71
> gmond
> 29788 mad 15 0 90736 2336 1084 S 0.3 0.1 0:02.91
> sshd
> 1 root 15 0 10372 684 572 S 0.0 0.0 0:00.51
> init
> 2 root RT -5 0 0 0 S 0.0 0.0 0:00.00
> migration/0
> 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00
> ksoftirqd/0
> 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
>
> I have everyone logged off of the head node. Four jobs are running on the
> compute nodes, but I believe they are non-parallel jobs which causes no
> traffic on the head node. The load_avg on each of the compute nodes is
> less than 8. Each compute node has 8 CPUs.
>
> How can I find the problem? I have seen the zombies go as high as 2 on
> the head node; most of the time there are 0 zombies.
>
> I did reboot the head node, but the problem comes back fairly quickly.
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>
--
Jonathan Billings <jsbillin at umich.edu>
College of Engineering - CAEN - Unix and Linux Support
More information about the redhat-list
mailing list