[Linux-cluster] Workings of Tiebreaker IP (RHCS)

Sun Sep 24 00:20:57 UTC 2006

  I  pulled a message from 2005  about tiebreakers. I have  some questions  and it does not seem to agree  with what I see culmanger do.

>> Hello,

>> 

>> To completely understand what the role of a tiebreaker IP within a two

>> or four node RHCS cluster is, I've searched redhat and Google. I can't

>> however find anything describing the precise workings of the

>> tiebreaker-IP. I would really like to know what happens excactly when

>> the tiebreaker is used an how (maybe even somekind of flow diagram). 

>> 

>> Can anyone here maybe explain that to me, or point me in the direction

>> of more specific information regarding tiebreaker?

>The tiebreaker IP address is used as an additional vote in the event

>that half the nodes become unreachable or dead in a 2 or 4 node >cluster

>on RHCS.

>The IP address must reside on the same network as is used for cluster

>communication.  To be a little more specific, if your cluster is using

>eth0 for communication, your IP address used for a tiebreaker must be

>reachable only via eth0 (otherwise, you will end up with a split >brain).

>When enabled, the nodes ping the given IP address at regular >intervals.

>When the IP address is not reachable, the tiebreaker is considered

>"dead".  When it is reachable, it is considered "alive".

>It acts as an additional vote (like an extra cluster member), except >for

>one key difference: Unless the default configuration is overridden, >the

How  does this  work? Does the node trying to become the active node access the tiebreaker and put a lock on it? How does it reseve it? 

Just  pinging it  would not prevent the other node from doing the same.

>IP tiebreaker may not be used to *form* a quorum where one did not >exist

>previously.

>So, if one node of a two node cluster is online, it will never become

>quorate unless the other node comes online (or administrator override,

>see man pages for "cluforce" and "cludb").

>So, in a 2 node cluster, if one node fails and the other node is >online

>(and the tiebreaker is still "alive" according to that node), the

>remaining node considers itself quorate and "shoots" (aka STONITHs, >aka

>fences) the dead node and takes over services.

>If a network partition occurs such that both nodes see the tiebreaker

>but not each other, the first one to fence the other will naturally >win.

>Ok, moving on...

>The disk tiebreaker works in a similar way, except that it lets the

>cluster limp in along in a safe, semi-split-brain (split brain) in a

>network outage.  What I mean is that because there's state information

>written to/read from the shared raw partitions, the nodes can actually

>tell via other means whether or not the other node is "alive" or not >as

>opposed to relying solely on the network traffic.

>Both nodes update state information on the shared partitions.  When >one

>node detects that the other node has not updated its information for a

>period of time, that node is "down" according to the disk subsystem.  >If

>this coincides with a "down" status from the membership daemon, the >node

>is fenced and services are failed over.  If the node never goes down

>(and keeps updating its information on the shared partitions), then >the

I do not use a IP tiebreaker. I have a two nodes system. When the active node shows it is down via memebership but up  via disk then

Clumanager determines it is in an “uncertain state” and shoots it. 

>node is never fenced and services never fail over.

-- Lon

---------------------------------
Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls.  Great rates starting at 1¢/min.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060923/7e0c6a70/attachment.htm>