[Linux-cluster] Monitoring services with Nagios
Chris St. Pierre
stpierre at NebrWesleyan.edu
Fri Jul 11 14:55:26 UTC 2008
I've attached my (very basic) check_rhcs script that I use with
Nagios. HTH.
Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University
On Fri, 11 Jul 2008, Finnur Örn Guðmundsson - TM Software wrote:
> Hi,
>
>
>
> I was planning on monitoring the status of a service from clustat (run clustat, grab the output).
>
> And as i am running a x86_64 system i can not seem to load the correct lib for snmpd to be able to read any data from it:
>
> nmpd[30150]: dlopen failed: /usr/lib64/cluster-snmp/libClusterMonitorSnmp.so: undefined symbol: _ZN17ClusterMonitoring7Cluster15runningServicesEv
>
>
>
> How do you monitor your cluster with Nagios/Other open source solutions ? (What scripts do you use etc).
>
>
>
> Kær kveðja / Best Regards,
>
> Finnur Örn Guðmundsson
> Network Engineer - Network Operations
> fog at t.is <mailto:fog at t.is>
>
> TM Software
> Urðarhvarf 6, IS-203 Kópavogur, Iceland
> Tel: +354 545 3000 - fax +354 545 3610
> www.tm-software.is <http://www.tm-software.is/>
>
> This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer <http://www.tm-software.is/disclaimer>
>
>
-------------- next part --------------
#! /usr/bin/perl -w
#
# $Id: check_rhcs 11710 2008-06-25 19:50:44Z stpierre $
#
# check_rhcs
#
# Nagios host script to check a Redhat Cluster Suite cluster
require 5.004;
use strict;
use lib qw(/usr/lib/nagios/plugins /usr/lib64/nagios/plugins /usr/local/nagios/libexec);
use utils qw($TIMEOUT %ERRORS &print_revision &support &usage);
use XML::Simple;
sub cleanup($$);
my $PROGNAME = "check_rhcs";
my $clustat = "/usr/sbin/clustat";
if (!-e $clustat) {
cleanup("UNKNOWN", "$clustat not found");
} elsif (!-x $clustat) {
cleanup("UNKNOWN", "$clustat not executable");
}
# Just in case of problems, let's not hang Nagios
$SIG{'ALRM'} = sub {
cleanup("UNKNOWN", "clustat timed out");
};
alarm($TIMEOUT);
my $output = `$clustat -x`;
my $retval = $?;
# Turn off alarm
alarm(0);
if ($output =~ /cman is not running/) {
cleanup("CRITICAL", $output);
} else {
my $status = XMLin($output, ForceArray => ['group']);
# check quorum
if (!$status->{'quorum'}->{'quorate'}) {
cleanup("CRITICAL", "Cluster is not quorate");
}
# check nodes
my %nodes = %{$status->{'nodes'}->{'node'}};
foreach my $node (keys(%nodes)) {
if (!$nodes{$node}->{'state'}) {
cleanup("WARNING", "Node $node is down");
} elsif (!$nodes{$node}->{'rgmanager'}) {
cleanup("WARNING", "rgmanager is not running on node $node");
}
}
# check services
my %svcs = %{$status->{'groups'}->{'group'}};
foreach my $svc (keys(%svcs)) {
if ($svcs{$svc}->{'state_str'} ne 'started') {
cleanup("CRITICAL", "$svc is in state " . $svcs{$svc}->{'state_str'});
}
}
# check return value
if ($retval) {
cleanup("UNKNOWN",
"Cluster appeared okay, but clustat returned $retval");
}
}
cleanup("OK", "Cluster is sound");
##############################
# Subroutines start here #
##############################
sub cleanup ($$) {
my ($state, $answer) = @_;
print "Cluster $state: $answer\n";
exit $ERRORS{$state};
}
More information about the Linux-cluster
mailing list