head node has an extremely high load average.

Jonathan Billings jsbillin at umich.edu
Wed Jun 26 19:35:53 UTC 2013


Hello,

Is your head node an NFS server, and are the jobs writing to the NFS share?


On Wed, Jun 26, 2013 at 3:27 PM, Doll, Margaret Ann <margaret_doll at brown.edu
> wrote:

> I have a computer cluster Running rocks 5.2,  Centos 6.
>
> The head node is over loaded.  There are 2 CPUs on the head node.
>
> top - 14:27:49 up 1 day,  6:11,  6 users,  load average: 13.65, 14.12,
> 13.92
> Tasks: 168 total,   3 running, 163 sleeping,   0 stopped,   2 zombie
> Cpu(s):  1.2%us,  1.9%sy,  0.0%ni,  0.0%id, 91.7%wa,  1.0%hi,  4.1%si,
> 0.0%st
> Mem:   2053088k total,  2001464k used,    51624k free,    74476k buffers
> Swap:  1020116k total,      388k used,  1019728k free,  1638076k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
>
>  2515 nobody    15   0  218m 3176 1048 S  2.3  0.2   8:46.23
> gmetad
>  2967 root      15   0     0    0    0 S  2.0  0.0   0:20.31
> nfsd
>  2970 root      15   0     0    0    0 R  1.0  0.0   0:20.60
> nfsd
>  3110 nobody    15   0  198m  20m 3360 S  0.3  1.0   4:22.71
> gmond
> 29788 mad       15   0 90736 2336 1084 S  0.3  0.1   0:02.91
> sshd
>     1 root      15   0 10372  684  572 S  0.0  0.0   0:00.51
> init
>     2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00
> migration/0
>     3 root      34  19     0    0    0 S  0.0  0.0   0:00.00
> ksoftirqd/0
>     4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
>
> I have everyone logged off of the head node.  Four jobs are running on the
> compute nodes, but I believe they are non-parallel jobs which causes no
> traffic on the head node.   The load_avg on each of the compute nodes is
> less than 8.  Each compute node has 8 CPUs.
>
> How can I find the problem?   I have seen the zombies go as high as 2 on
> the head node; most of the time there are 0 zombies.
>
> I did reboot the head node, but the problem comes back fairly quickly.
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>



-- 
Jonathan Billings <jsbillin at umich.edu>
College of Engineering - CAEN - Unix and Linux Support



More information about the redhat-list mailing list