[Linux-cluster] rgmanager ceases to send syslog messages

Robert Hurst rhurst at bidmc.harvard.edu
Tue Aug 14 18:46:39 UTC 2007


Odd, a member node's rgmanager (clurgmgrd) stopped sending syslog
messages, in particular, a 'status' message of a service it was running.
This causes us a problem, as we monitor syslog messages from a
centralized server to update us of services running by nodename.

Is there a signal or event that can trigger clurgmgrd to restart its
monitoring and logging of its running service?

The last instances of it running and showing 'WATSON status' follow.
Note, I realize there was an issue with this particular cluster.conf
change, but those changes had nothing to do with the WATSON service, and
all other nodes are still sending their 'service status' syslog
messages.  Why would 'WATSON status' just stop?

Aug  6 14:38:35 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/WATSON status 
Aug  6 14:39:05 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/WATSON status
Aug  6 14:39:20 db5 ccsd[13802]: Update of cluster.conf complete
(version 187 -> 188).
Aug  6 14:39:25 db5 clurgmgrd[16354]: <notice> Reconfiguring 
Aug  6 14:39:25 db5 clurgmgrd[16354]: <info> Loading Service Data 
Aug  6 14:39:25 db5 clurgmgrd[16354]: <err> Error storing ip: Duplicate 
Aug  6 14:39:26 db5 clurgmgrd[16354]: <err> Unique attribute collision.
type=clusterfs attr=device value=/dev/VGCCC1/lvol0 
Aug  6 14:39:26 db5 clurgmgrd[16354]: <err> Error storing clusterfs
resource 
Aug  6 14:39:26 db5 clurgmgrd[16354]: <err> Unique attribute collision.
type=clusterfs attr=device value=/dev/VGCCC1/lvol1 
Aug  6 14:39:26 db5 clurgmgrd[16354]: <err> Error storing clusterfs
resource 
Aug  6 14:39:26 db5 clurgmgrd[16354]: <info> Stopping changed
resources. 
Aug  6 14:39:26 db5 clurgmgrd[16354]: <info> Restarting changed
resources. 
Aug  6 14:39:26 db5 clurgmgrd[16354]: <info> Starting changed
resources. 
Aug  6 14:39:26 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/syslogger stop
Aug  6 14:39:27 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/luci stop 
Aug  6 14:39:27 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/webmin stop
Aug  6 14:39:27 db5 clurgmgrd: [16354]: <info>
Executing /etc/init.d/nagios stop

I continue to get messages from clurgmgrd, but only through Magma Event
changes, i.e.:

Aug  7 16:09:03 db5 clurgmgrd[16354]: <info> Magma Event: Membership
Change 
Aug  7 16:09:03 db5 clurgmgrd[16354]: <info> State change: db1 UP

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070814/db04c830/attachment.htm>


More information about the Linux-cluster mailing list