[Linux-cluster] Monitoring Failovers

Tue Feb 24 20:23:11 UTC 2009

Just as a followup, I took a look at the output of the clustat -x, and
one of the values is "last transition".  I wrote a check that looks at
a given service and then calculates the difference between the current
time and the last transition.  If that time is lower than a given
threshold, it alarms.  it is kind of a hack, but will do until I can
get a scripts and log parsing checks to have a little more proactive
approach.

B

On Fri, Feb 20, 2009 at 7:17 PM, Burton Simonds
<burton at simondsfamily.com> wrote:
> I was actually looking in Google for something like that earlier
> today.   That would work, but still has the issue of tracking the
> previous state.  From what I have read about the clustered services
> checks, is that it will see if the service is running somewhere, but
> will not notify if the service has changed state.  I am running NRPE
> on the clustered hosts and using that to check the processes on each
> of the hosts.
>
> I am looking at setting up the cluster-snmp stuff, and I will see if
> that will provide me with the information I need.  Otherwise, I might
> just go with log scraping.
>
> B
> On Fri, Feb 20, 2009 at 5:55 PM, eric rosel <neuroticimbecile at yahoo.com> wrote:
>> Hi List,
>>
>> I've been toying with the idea of writing an init script resource which will send an alert to <type your favorite network/host monitoring system here> everytime it gets called with a "start" or "stop" argument.
>>
>> Another way is to make it send "alive" messages everytime it's called with "status", and then configure your monitoring app to sound the sirens when it stops getting those messages, or if the source of those messages changes.
>>
>> One then simply has to include this script resource with a clustered service.
>>
>> -eric
>>
>>
>> --- On Sat, 2/21/09, Martin Fuerstenau <martin.fuerstenau at oce.com> wrote:
>>
>>> From: Martin Fuerstenau <martin.fuerstenau at oce.com>
>>> Subject: Re: [Linux-cluster] Monitoring Failovers
>>> To: "linux clustering" <linux-cluster at redhat.com>
>>> Date: Saturday, February 21, 2009, 12:41 AM
>>> It is a little bit hard to do. It is on my todo list too.
>>> The problem is
>>> to determine the old state. So for example if you switch an
>>> ip address
>>> and you have a service bound to that address you have
>>> nearly no chance
>>> to monitor it from the Nagios side.
>>>
>>> I have tested using the MAC address and arp but this is
>>> awesome if you
>>> have bonding. Because if the MAC switches it may be the
>>> bonding of the
>>> cluster or the cluster switched. But hardcoded MAC
>>> addresses in the
>>> monitor script will not be good idea.
>>>
>>> Too much trouble in maintenance.
>>>
>>> If anyone has a good idea I will write the plugin and post
>>> it
>>> Nagiosexchange.
>>>
>>> Martin Fuerstenau
>>>
>>> On Fri, 2009-02-20 at 11:04 -0500, Burton Simonds wrote:
>>> > I am in the process of setting up Nagios for system
>>> monitoring, and I
>>> > would like to have a way to know if a failover has
>>> occurred.  If
>>> > everything works as it should, there be a minimal
>>> impact on the
>>> > services.  Right now it looks like my best bet is
>>> basically scrape the
>>> > logs and look for the failover messages there and
>>> trigger an alarm.
>>> >
>>> > I was wondering if anyone else has done anything.  I
>>> found in an
>>> > archive a check_rhcs script that I am going to employ
>>> (which looks
>>> > pretty cool), but that just looks at the status of the
>>> services.  I
>>> > want to either compare the current status to the
>>> previous status or
>>> > have something monitoring the cluster an pushes the
>>> alert to Nagios.
>>> >
>>> > Thanks,
>>> > B
>>
>>
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>