[Linux-cluster] totem token & post_fail_delay question

Vasil Valchev vasil.val at gmail.com
Tue Aug 26 06:56:29 UTC 2014


Hello,

I have a cluster that sometimes has intermittent network issues on the
heartbeat network.
Unfortunately improving the network is not an option, so I am looking for a
way to tolerate longer interruptions.

Previously it seemed to me the post_fail_delay option is suitable, but
after some research it might not be what I am looking for.

If I am correct, when a member leaves (due to token timeout) the cluster
will wait the post_fail_delay before fencing. If the member rejoins before
that, it will still be fenced, because it has previous state?
>From a recent fencing on this cluster there is a strange message:

Aug 24 06:20:45 node2 openais[29048]: [MAIN ] Not killing node node1cl
despite it rejoining the cluster with existing state, it has a lower node ID

What does this mean?

And lastly is increasing the totem token timeout the way to go?


Thanks,
Vasil Valchev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140826/4163f619/attachment.htm>


More information about the Linux-cluster mailing list