[Linux-cluster] cman bad generation number
pcaulfie at redhat.com
Wed Jan 5 09:00:44 UTC 2005
On Tue, Jan 04, 2005 at 02:46:17PM -0800, Daniel McNeil wrote:
> One thing I do not understand is that I am leaving the nodes in the
> cluster and just doing mounting and umounting, so the generation number
> should not be changing.
> I think you are saying the the lock traffic is so high that the heart
> are lost so the node being kicked out is seeing the new heart beat
> from the other nodes and doesn't know they are not receiving his
> heartbeat messages. This node must be seeing the other nodes
> heartbeat messages or it would have started a membership transition
> without the other nodes. Do I have this right?
Yes, I think. It's all a bit vague. If it wasn't I might have an answer by now
> Shouldn't the heartbeat messages have higher priority
> over the lock traffic messages?
They do. That's why I am puzzled. I'm currently investigating if the heartbeat
thread is being starved of CPU time by either the DLM or GFS.
> Shouldn't there be a way of throttling back the lock traffic and seeing
> if heartbeat connection can be re-established before starting a
> membership transition?
DLM & CMAN are not that tightly coupled.
More information about the Linux-cluster