[Spacewalk-list] Monitoring and notifications
David Nutter
davidn at bioss.sari.ac.uk
Wed Oct 21 15:33:58 UTC 2009
Ever since upgrading to spacewalk 0.6, notifications have been piling
up in /var/lib/notification/queue/alert_queue/ but are not getting
picked up by the system that emails out alerts. Eventually of course I
get the infamous "notification meltdown" emails and have to
clean out the queue manually.
Some observations:
- The spacewalk web interface shows monitoring data, status etc just
fine so the problem appears to be confined to the notification
subsystem.
- show-queue and monitor-queue pick up the fact that there are
notifications waiting.
- The notification method is a simple email to the Spacewalk
administrator and is called "Email2"
- From the command line I can send email from the spacewalk server.
- /usr/bin/notifier has active SMTP sessions with localhost so it can
send mail if it wants to. However, the sendmail log shows no
activity then these connections time out.
- Nothing of use is logged in
/var/log/nocpulse/{NotifEscalator-error,notif-escalator,Notifier-error,notifier,NotifLauncher-error,notif-launcher}.log
All I get is the keepalive messages (variants on "polling", "waiting
for new sends" etc. It's almost as if the queue of notifications is
not being read as I don't see any attempts to escalate or send
notifications.
- I tried increased the logging by editing the scripts
/usr/bin/{notifier,notif-escalator,notif-launcher} then restarting
monitoring. No joy. Notif-launcher logs an empty message once per
second at loglevel 9.
- All alerts generated are "legit", from probes that have gone to
Warning or Critical and mention "Email2" when
listed by show-alert. An example is attached.
- I can't find any way to get spacewalk to test the notification
method, other than the script test_alert which inserts a message
into the alert queue. This message is then just ignored like all the
others.
Any thoughts on further debugging steps? I'm rather confused about the
relationships between the three notifier scripts which isn't
helping. Any insight gratefully received :)
Regards,
--
David Nutter Tel: +44 (0)131 650 4888
BioSS, JCMB, King's Buildings, Mayfield Rd, EH9 3JZ. Scotland, UK
Biomathematics and Statistics Scotland (BioSS) is formally part of The
Scottish Crop Research Institute (SCRI), a registered Scottish charity
No. SC006662
-------------- next part --------------
$VAR1 = bless( {
'satcluster' => '1',
'commandLongName' => 'Memory Usage',
'time' => '1256136322',
'checkCommand' => '25',
'current_time' => 1256136329,
'probeType' => 'ServiceProbe',
'state' => 'WARNING',
'hostAddress' => '192.168.15.145',
'probeDescription' => 'Linux: Memory Usage',
'clusterId' => '1',
'mac' => '00:16:3E:25:C5:19',
'probeId' => '81',
'version' => '1.0',
'subject' => '',
'groupId' => '21',
'message' => 'RAM free 11.26 MB (below warning threshold of 20.00 MB)
Notification #1 for RAM free
',
'hostName' => 'bcsossg.bioss.sari.ac.uk',
'snmp' => '',
'customerId' => '1',
'probeGroupName' => 'linux',
'hostProbeId' => '',
'osName' => 'Linux System',
'groupName' => 'Email2',
'type' => 'service',
'physicalLocationName' => 'Generic All-Encompassing Location',
'clusterDesc' => 'RHN Monitoring Satellite',
'snmpPort' => '',
'ticket_id' => '01_1256136329_002233_001'
}, 'NOCpulse::Notif::Alert' );
More information about the Spacewalk-list
mailing list