head node has an extremely high load average.

Thu Jun 27 13:31:02 UTC 2013

On Thu, Jun 27, 2013 at 8:54 AM, Miner, Jonathan W (US SSA) <
jonathan.w.miner at baesystems.com> wrote:

>
> > I installed the iozone program and ran ./iozone -a.
>
> iozone allows you to benchmark disk performance and gives you objective
> measurements.
>
> > How does this information help me find the offending program?
>
> Not sure you're looking for a "program"... I think you know what program
> is doing the IO on your client machines,

Users are running gaussian or their own original programs on the compute
nodes.
How does one determine from which node the massive io requests are coming?

and we know that "nfsd" is doing the IO on the server, and we know from
> your previous output that you have high IO wait times.  So... you should be
> looking at which disks are involved, and why the wait times are so high..
>
> Are you using single drives, software raid, hardware raid? What type of
> bus?
>

The head node has a single disk for most users' use.  A second disk is
owned by a single research group which was not involved in the problem.

Most of the compute nodes have a single disk.  There are two compute nodes
that have a second 700 Gb drive for use with gaussian calculations.  The
user that caused the io problem was using one of these compute nodes and
obviously not using the scratch space on the compute node.

>
>
> A lot goes into performance monitoring:
>
> http://www.thegeekstuff.com/2011/03/linux-performance-monitoring-intro
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>