[Linux-cluster] Monitoring Failovers

Fri Feb 20 22:55:22 UTC 2009

Hi List,

I've been toying with the idea of writing an init script resource which will send an alert to <type your favorite network/host monitoring system here> everytime it gets called with a "start" or "stop" argument.

Another way is to make it send "alive" messages everytime it's called with "status", and then configure your monitoring app to sound the sirens when it stops getting those messages, or if the source of those messages changes.

One then simply has to include this script resource with a clustered service.

-eric

--- On Sat, 2/21/09, Martin Fuerstenau <martin.fuerstenau at oce.com> wrote:

> From: Martin Fuerstenau <martin.fuerstenau at oce.com>
> Subject: Re: [Linux-cluster] Monitoring Failovers
> To: "linux clustering" <linux-cluster at redhat.com>
> Date: Saturday, February 21, 2009, 12:41 AM
> It is a little bit hard to do. It is on my todo list too.
> The problem is
> to determine the old state. So for example if you switch an
> ip address
> and you have a service bound to that address you have
> nearly no chance
> to monitor it from the Nagios side. 
> 
> I have tested using the MAC address and arp but this is
> awesome if you
> have bonding. Because if the MAC switches it may be the
> bonding of the
> cluster or the cluster switched. But hardcoded MAC
> addresses in the
> monitor script will not be good idea.
> 
> Too much trouble in maintenance.
> 
> If anyone has a good idea I will write the plugin and post
> it
> Nagiosexchange.
> 
> Martin Fuerstenau
> 
> On Fri, 2009-02-20 at 11:04 -0500, Burton Simonds wrote:
> > I am in the process of setting up Nagios for system
> monitoring, and I
> > would like to have a way to know if a failover has
> occurred.  If
> > everything works as it should, there be a minimal
> impact on the
> > services.  Right now it looks like my best bet is
> basically scrape the
> > logs and look for the failover messages there and
> trigger an alarm.
> > 
> > I was wondering if anyone else has done anything.  I
> found in an
> > archive a check_rhcs script that I am going to employ
> (which looks
> > pretty cool), but that just looks at the status of the
> services.  I
> > want to either compare the current status to the
> previous status or
> > have something monitoring the cluster an pushes the
> alert to Nagios.
> > 
> > Thanks,
> > B