[Linux-cluster] node fails to join cluster after it was fenced

Frederik Ferner frederik.ferner at diamond.ac.uk
Thu Feb 15 11:36:03 UTC 2007


On Thu, 2007-02-15 at 09:07 +0000, Patrick Caulfield wrote:
> Frederik Ferner wrote:
> > On Wed, 2007-02-14 at 16:33 +0000, Patrick Caulfield wrote:
> >> Frederik Ferner wrote:
> >>> I've just discovered that I seem to have the same problem on one more
> >>> cluster, so maybe I've change something that causes this but did not
> >>> affect a running cluster. I'll append the cluster.conf for the original
> >>> cluster as well.
> >>>
> >> Looking at the tcpdump it seems that the existing node isn't seeing the joinreq
> >> message from the fenced one - there are no responses to it at all. You haven't
> >> enabled any iptables filtering have you ?
> > 
> > But they seem to reach the network card at least, correct? So I don't
> > have to start looking at the switch, should I?
> 
> Well, they are reaching tcpdump - more than that is hard to say ;-)

> You could try running tcpdump on the two machines to see if the packets are the
> same on both...if so then it could be some strange bug in cman that we've not
> seen before that's preventing it seeing incoming packets (I have no idea what
> that might be though, off hand)

I've had a look at the tcpdump on both machines at the same time. The
packets look identical to me. I've attached the two tcpdump files, maybe
someone can see a difference that I'm missing.

> It would be interesting to know - though you may not want to do it - if the
> problem persists when the still-running node is rebooted.

Obviously not at the moment, but I have a maintenance window upcoming
soon where I might be able to do that. I'll keep you informed about the
result.

Thanks for looking into that,
Frederik
-- 
Frederik Ferner 
Systems Administrator                  Phone: +44 (0)1235-778624
Diamond Light Source                   Fax:   +44 (0)1235-778468




More information about the Linux-cluster mailing list