[Linux-cluster] node fails to join cluster after it was fenced
frederik.ferner at diamond.ac.uk
Wed Feb 14 13:05:04 UTC 2007
I've recently run into the problem that in one of my clusters the second
node doesn't join the cluster anymore.
First some background on my setup here. I have a couple of two node
clusters connected to a common storage each. They're basically identical
setups running basically RHEL4U4 and corresponding cluster suite.
Everything was running fine until yesterday in one clusters one node
(i04-storage2) was fenced and can't seem to join the cluster anymore,
all I could find was messages in the log files of i04-storage2 telling
me "kernel: CMAN: sending membership request" over and over again. On
the node still in the cluster (i04-storage1) I could see nothing in any
To get i04-storage2 back into my cluster, I tried to fence it again
using fence_tool on i04-storage1 without success. The node gets fenced,
as I can see on i04-storage1 in the log. When I increased the version of
the cluster config on the working node, the join request was rejected
directly but the same timeout occured when I copied the new
configuration and tried to start the cluster suite again.
There's no firewall on any computer involved, both are connected to the
same switch. Using wireshark I can see UDP packets with source and
destination port 6809 going from i04-storage2 to i04-storage1 and from
i04-storage1 to the network broadcast address. No other network traffic
seems to be going between these two hosts.
The same setup used to work fine. All other clusters are supposed to be
identical to that one and I don't see that kind of behaviour. If there's
a difference, I can't spot it.
Does anyone have any suggestions what else I could look for? What could
be wrong here?
If you need any other bits of information that I haven't supplied,
Many thanks in advance,
Systems Administrator Phone: +44 (0)1235-778624
Diamond Light Source Fax: +44 (0)1235-778468
More information about the Linux-cluster