Defining load thresholds for Nagios

Tue Nov 4 13:56:53 UTC 2008

On Tue, Nov 4, 2008 at 6:49 AM, Erling Ringen Elvsrud <erlingre at gmail.com>wrote:

> Hello list,
>
> I have been reading and thinking about proper thresholds for
> the check_load plugin in Nagios.
>
> My current understanding of load in Linux:
>
> The load average over 1,5, and 15 min in Linux is the number of processes
> in running, runnable, and uninterruptable sleep states
> (according to the load entry in Wikipedia).
> According to the same Wikipedia page processes in the uninterruptable state
> usually waits for I/O so both CPU-bound and IO-bound processes
> can contribute to the load average.
> So if we have a server with many I/O-bound processes the
> CPU utilization can be low and the load average can be high.
> The number of cores or CPUs also determines the impact of the load.
> A load of 8 can therefore mean that all cores in a 2 x 4 core-server are
> utilized.
>
> To determine where to set warning and critical thresholds the impact the
> load
> has on the services running must also be taken into account. For
> instance on a system running large batch-jobs a high load can be less
> of a problem than
> on a system running a webserver where users want a response quickly.
>
> So if you had a server where you had little knowledge of the services,
> how would you pick thresholds for 1,5, and 15 min warning and 1,5, and
> 15 min critical?
>
> Thanks,
>
> Erling
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>

while not linux, the rule for solaris/sparc is 5 for all.
we use that for solaris/x86 and linux.