[Linux-cluster] Heartbeat time outs in rhel4 understanding
Elias, Michael
EliasM at dnb.com
Tue May 5 17:48:09 UTC 2009
I am trying to understand how these timers interact with each other.
In a RHEL4 cluster the heartbeat defaults are;
hello_timer:5
max_retries:5
deadnode_timeout:21
Meaning a heartbeat message is sent every 5 seconds, if it fails to
receive a response it will start a deadnode counter @ 21 seconds. It
will also try to send 5 more heartbeat requests. What is the interval of
those retries? If none of those requests receive a response. 5 seconds
pass.. there is 15 seconds left on the deadnode timer and we try upto 5
times to get a response.... This goes on until we hit the 4th iteration
of the hellotimer it tries again upto 5 times and fails... we then hit
the 21 second on the deadnode time.. fenced takes over and wham reboot.
Is my understanding of this correct????
Thanks for any help..
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090505/8e3bd22e/attachment.htm>
More information about the Linux-cluster
mailing list