[Spacewalk-list] Monitoring and notifications

David Nutter davidn at bioss.sari.ac.uk
Wed Oct 21 15:33:58 UTC 2009


Ever since upgrading to spacewalk 0.6, notifications have been piling
up in /var/lib/notification/queue/alert_queue/ but are not getting
picked up by the system that emails out alerts. Eventually of course I
get the infamous "notification meltdown" emails and have to
clean out the queue manually. 

Some observations:

- The spacewalk web interface shows monitoring data, status etc just
  fine so the problem appears to be confined to the notification
  subsystem.

- show-queue and monitor-queue pick up the fact that there are
  notifications waiting. 

- The notification method is a simple email to the Spacewalk
  administrator and is called "Email2" 

- From the command line I can send email from the spacewalk server.

- /usr/bin/notifier has active SMTP sessions with localhost so it can
  send mail if it wants to. However, the sendmail log shows no
  activity then these connections time out. 

- Nothing of use is logged in
  /var/log/nocpulse/{NotifEscalator-error,notif-escalator,Notifier-error,notifier,NotifLauncher-error,notif-launcher}.log
  All I get is the keepalive messages (variants on "polling", "waiting
  for new sends" etc. It's almost as if the queue of notifications is
  not being read as I don't see any attempts to escalate or send
  notifications. 

- I tried increased the logging by editing the scripts
  /usr/bin/{notifier,notif-escalator,notif-launcher} then restarting
  monitoring. No joy. Notif-launcher logs an empty message once per
  second at loglevel 9.

- All alerts generated are "legit", from probes that have gone to
  Warning or Critical and mention "Email2" when
  listed by show-alert. An example is attached. 

- I can't find any way to get spacewalk to test the notification
  method, other than the script test_alert which inserts a message
  into the alert queue. This message is then just ignored like all the
  others. 

Any thoughts on further debugging steps? I'm rather confused about the
relationships between the three notifier scripts which isn't
helping. Any insight gratefully received :)

Regards,

-- 
David Nutter  				Tel: +44 (0)131 650 4888
BioSS, JCMB, King's Buildings, Mayfield Rd, EH9 3JZ. Scotland, UK 

Biomathematics and Statistics Scotland (BioSS) is formally part of The
Scottish Crop Research Institute (SCRI), a registered Scottish charity
No. SC006662
-------------- next part --------------
$VAR1 = bless( {
                 'satcluster' => '1',
                 'commandLongName' => 'Memory Usage',
                 'time' => '1256136322',
                 'checkCommand' => '25',
                 'current_time' => 1256136329,
                 'probeType' => 'ServiceProbe',
                 'state' => 'WARNING',
                 'hostAddress' => '192.168.15.145',
                 'probeDescription' => 'Linux: Memory Usage',
                 'clusterId' => '1',
                 'mac' => '00:16:3E:25:C5:19',
                 'probeId' => '81',
                 'version' => '1.0',
                 'subject' => '',
                 'groupId' => '21',
                 'message' => 'RAM free 11.26 MB (below warning threshold of 20.00 MB)
Notification #1 for RAM free
',
                 'hostName' => 'bcsossg.bioss.sari.ac.uk',
                 'snmp' => '',
                 'customerId' => '1',
                 'probeGroupName' => 'linux',
                 'hostProbeId' => '',
                 'osName' => 'Linux System',
                 'groupName' => 'Email2',
                 'type' => 'service',
                 'physicalLocationName' => 'Generic All-Encompassing Location',
                 'clusterDesc' => 'RHN Monitoring Satellite',
                 'snmpPort' => '',
                 'ticket_id' => '01_1256136329_002233_001'
               }, 'NOCpulse::Notif::Alert' );




More information about the Spacewalk-list mailing list