head node has an extremely high load average.

Thu Jun 27 13:21:41 UTC 2013

The new top showed the same problem after the reboot.  The queued jobs
continued during the reboot of the head node.  These jobs started up the
problem with the io as soon as the head node was back up.  Killing one of
the queued jobs stopped the problem.

I need a way to determine which queued job is causing the io problem, so
that I can kill it.  (And hopefully train the user to write their program
to write to the compute node and not the head node during execution.)

Thanks to everyone for their help.

On Thu, Jun 27, 2013 at 9:15 AM, Yixin Luo <luoyixin at gmail.com> wrote:

> It is true - %wa is too high if non-parallel job run. Ann, what is the new
> top after reboot?
>
> Yixin
>
>
> On Thu, Jun 27, 2013 at 7:31 AM, Doll, Margaret Ann <
> margaret_doll at brown.edu
> > wrote:
>
> > I installed the iozone program and ran ./iozone -a.
> > How does this information help me find the offending program?
> >
> > Sorry for my ignorance.
> >
> > I do have 10 nfsd programs running.   I only have four jobs on the queues
> > none of which are running parallel code.
> >
> >
> > On Thu, Jun 27, 2013 at 7:42 AM, Miner, Jonathan W (US SSA) <
> > jonathan.w.miner at baesystems.com> wrote:
> >
> > >
> > > > From: redhat-list-bounces at redhat.com [redhat-list-bounces at redhat.com
> ]
> > > on behalf of Yixin Luo [luoyixin at gmail.com]
> > > > Sent: Wednesday, June 26, 2013 17:56
> > > > To: General Red Hat Linux discussion list
> > > > Subject: Re: head node has an extremely high load average.
> > > >
> > > > NFS may hang up. Have you tried running autofs?
> > >
> > > Can you explain why "autofs" would be better than NFS?   I have not
> > > managed any NFS-based systems for nearly a decade, but from what I
> > > remember, autofs simplifies the management aspect of network
> filesystems;
> > > but NFS is still the underlaying protocol.  Without autofs, things were
> > > mounted all the time, and you'd have to push changes out to all the
> > > clients' /etc/fstab files.
> > >
> > > As for Margaret's original problem, her system looks very I/O bound.
> >  Like
> > > someone else suggested, I'd start looking at the local disk performance
> > and
> > > see if one disk, or one bus was in contention for most of the traffic.
> > >  Then look at the number of nfsd processes and make sure they're
> > > appropriate for the expected load. The iozone program should help you
> > with
> > > this task.
> > >
> > > http://www.thegeekstuff.com/2011/05/iozone-examples/
> > >
> > > - Jon
> > >
> > >
> > >
> > > --
> > > redhat-list mailing list
> > > unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> > > https://www.redhat.com/mailman/listinfo/redhat-list
> > >
> > --
> > redhat-list mailing list
> > unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> > https://www.redhat.com/mailman/listinfo/redhat-list
> >
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>