[Linux-cluster] Node won't rejoin after reboot
jakub.suchy at enlogit.cz
Wed Sep 24 14:34:59 UTC 2008
we are currently trying to determine a problem in our cluster setup. We
are having two problems, both related together:
1) When doing failover, living node reports "waiting for node to be
fenced" and no failover is done...
2) When the failing node rejoins the cluster, it is killed with a
message: "Killing node node2 because it has rejoined the cluster with existing state"
Both seems to be network related, Cisco infrastructure (65xx and 35xx). And both of them
disappear when moving to non-Cisco infrastructure.
Please let me emphasize, that I AM aware of this document:
And we have configured the Cisco according to this (and nevertheless, I
believe this is valid only for multi-switch infrastructure, our nodes
are both connected to a single switch).
We are also aware of:
But this is not our problem, again, single-switch scenario. We have
tried to turn IGMP snooping off and our engineers reported that it
I have intercepted the traffic on the living node using tcpdump,
including all layer headers and it seems that there is no IGMP Join
message from the second node. I suspect it may be the problem. Do
anybody know any details I can check or a fix for this bug?
More information about the Linux-cluster