[Linux-cluster] cman kickout out nodes for no good reason

Patrick Caulfield pcaulfie at redhat.com
Tue Apr 11 07:47:52 UTC 2006


Olivier Crête wrote:
> On Thu, 2006-06-04 at 12:34 -0400, Olivier Crête wrote:
>> I have a strange problem where cman suddenly starts kicking out members
>> of the cluster with "Inconsistent cluster view" when I join a new node
>> (sometimes).  It takes a few minutes between each kicking. I'm using a
>> snapshot for March 12th of the STABLE branch on 2.6.16. The cluster is
>> in transition state at that point and I can't stop/start services or do
>> anything else. It did not do that with a snapshot I took a few months
>> ago.
> 
> Its still happening, the node that joins says "Transition master
> unknown", while all of the other nodes who the master is, then the
> master gets kicked out. Then a new master is selected, all of the nodes
> seem to know who the master is, but refuse to act on it. After a while,
> the new master is kicked out and the process restarts. I guess its
> related to the changes with the timestamps to prevent master desync, I
> dont see any other recent change that could have caused it.
> 

That's very peculiar behaviour, and it's going to be hard to pin down. How
consistently does it happen ?

It could be caused by extreme network packet loss, or something blocking the
progress of cman processes. Are the already joined nodes very busy when you
bring the new node into the cluster (if so, doing what?)

I think the best way to try and track this down is to get a tcpdump of the
cluster traffic (port 6809/udp) happening at the time of the join - make sure
that all nodes are included in the dump and that all of the packet is captured.

-- 

patrick




More information about the Linux-cluster mailing list