[Linux-cluster] rgmanager ceases to send syslog messages
Robert Hurst
rhurst at bidmc.harvard.edu
Tue Aug 14 18:46:39 UTC 2007
Odd, a member node's rgmanager (clurgmgrd) stopped sending syslog
messages, in particular, a 'status' message of a service it was running.
This causes us a problem, as we monitor syslog messages from a
centralized server to update us of services running by nodename.
Is there a signal or event that can trigger clurgmgrd to restart its
monitoring and logging of its running service?
The last instances of it running and showing 'WATSON status' follow.
Note, I realize there was an issue with this particular cluster.conf
change, but those changes had nothing to do with the WATSON service, and
all other nodes are still sending their 'service status' syslog
messages. Why would 'WATSON status' just stop?
Aug 6 14:38:35 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/WATSON status
Aug 6 14:39:05 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/WATSON status
Aug 6 14:39:20 db5 ccsd[13802]: Update of cluster.conf complete
(version 187 -> 188).
Aug 6 14:39:25 db5 clurgmgrd[16354]: <notice> Reconfiguring
Aug 6 14:39:25 db5 clurgmgrd[16354]: <info> Loading Service Data
Aug 6 14:39:25 db5 clurgmgrd[16354]: <err> Error storing ip: Duplicate
Aug 6 14:39:26 db5 clurgmgrd[16354]: <err> Unique attribute collision.
type=clusterfs attr=device value=/dev/VGCCC1/lvol0
Aug 6 14:39:26 db5 clurgmgrd[16354]: <err> Error storing clusterfs
resource
Aug 6 14:39:26 db5 clurgmgrd[16354]: <err> Unique attribute collision.
type=clusterfs attr=device value=/dev/VGCCC1/lvol1
Aug 6 14:39:26 db5 clurgmgrd[16354]: <err> Error storing clusterfs
resource
Aug 6 14:39:26 db5 clurgmgrd[16354]: <info> Stopping changed
resources.
Aug 6 14:39:26 db5 clurgmgrd[16354]: <info> Restarting changed
resources.
Aug 6 14:39:26 db5 clurgmgrd[16354]: <info> Starting changed
resources.
Aug 6 14:39:26 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/syslogger stop
Aug 6 14:39:27 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/luci stop
Aug 6 14:39:27 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/webmin stop
Aug 6 14:39:27 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/nagios stop
I continue to get messages from clurgmgrd, but only through Magma Event
changes, i.e.:
Aug 7 16:09:03 db5 clurgmgrd[16354]: <info> Magma Event: Membership
Change
Aug 7 16:09:03 db5 clurgmgrd[16354]: <info> State change: db1 UP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070814/db04c830/attachment.htm>
More information about the Linux-cluster
mailing list