[Linux-cluster] Nodes leaving and re-joining intermittently

Matthew Painter matthew.painter at kusiri.com
Sat Dec 10 20:32:05 UTC 2011


Hi all,

We are trying to get to the bottom of some odd intermittent behavior on a
cluster. We are intermittently seeing nodes leave and rejoin clusters,
without being fenced. Further the gap between leaving on re-joining is 8
minutes. We are monitoring the latency between boxes, and it is acceptable
(<5ms).

How can nodes exhibit this behavior? There seem to be no impact on the
services running on the box, just this leaving and re-joining. The SNMP
messages are below.

All help decoding this gratefully received! :)

Thanks,

Matt


Sat Dec 10 15:22:00 GMT 2011: cluster3.localdomain
DISMAN-EVENT-MIB::sysUpTimeInstance
= 3:2:52:23.35, SNMPv2-MIB::snmpTrapOID.0 =
COROSYNC-MIB::corosyncNoticesNodeStatus,
COROSYNC-MIB::corosyncObjectsNodeName.0 = "cluster1.localdomain",
COROSYNC-MIB::corosyncObjectsNodeID.0 = 1,
COROSYNC-MIB::corosyncObjectsNodeAddress.0
= "10.79.202.1", COROSYNC-MIB::corosyncObjectsNodeStatus.0 = "left"

Sat Dec 10 15:30:25 GMT 2011: cluster3.localdomain
DISMAN-EVENT-MIB::sysUpTimeInstance
= 3:3:00:48.75, SNMPv2-MIB::snmpTrapOID.0 =
COROSYNC-MIB::corosyncNoticesNodeStatus,
COROSYNC-MIB::corosyncObjectsNodeName.0 = "cluster1.localdomain",
COROSYNC-MIB::corosyncObjectsNodeID.0 = 1,
COROSYNC-MIB::corosyncObjectsNodeAddress.0
= "10.79.202.1", COROSYNC-MIB::corosyncObjectsNodeStatus.0 = "joined"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20111210/578c078a/attachment.htm>


More information about the Linux-cluster mailing list