<div class="MsoNormal">I pulled a message from 2005 about tiebreakers. I have some questions and it does not seem to agree with what I see culmanger do.</div> <div class="MsoNormal"><o:p> </o:p></div> <pre><o:p> </o:p></pre><pre>>> Hello,<o:p></o:p></pre><pre>>> <o:p></o:p></pre><pre>>> To completely understand what the role of a tiebreaker IP within a two<o:p></o:p></pre><pre>>> or four node RHCS cluster is, I've searched redhat and Google. I can't<o:p></o:p></pre><pre>>> however find anything describing the precise workings of the<o:p></o:p></pre><pre>>> tiebreaker-IP. I would really like to know what happens excactly when<o:p></o:p></pre><pre>>> the tiebreaker is used an how (maybe even somekind of flow diagram). <o:p></o:p></pre><pre>>> <o:p></o:p></pre><pre>>> Can anyone here maybe explain that to me, or point me in the direction<o:p></o:p></pre><pre>>> of more specific information regarding tiebreaker?<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>>The tiebreaker IP address is used as an additional vote in the event<o:p></o:p></pre><pre>>that half the nodes become unreachable or dead in a 2 or 4 node >cluster<o:p></o:p></pre><pre>>on RHCS.<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>>The IP address must reside on the same network as is used for cluster<o:p></o:p></pre><pre>>communication. To be a little more specific, if your cluster is using<o:p></o:p></pre><pre>>eth0 for communication, your IP address used for a tiebreaker must be<o:p></o:p></pre><pre>>reachable only via eth0 (otherwise, you will end up with a split >brain).<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>>When enabled, the nodes ping the given IP address at regular >intervals.<o:p></o:p></pre><pre>>When the IP address is not reachable, the tiebreaker is considered<o:p></o:p></pre><pre>>"dead". When it is reachable, it is considered "alive".<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>>It acts as an additional vote (like an extra cluster member), except >for<o:p></o:p></pre><pre>>one key difference: Unless the default configuration is overridden, >the</pre><pre><o:p> </o:p></pre><pre>How does this work? Does the node trying to become the active node access the tiebreaker and put a lock on it? How does it reseve it? </pre><pre>Just pinging it would not prevent the other node from doing the same.</pre><pre><o:p> </o:p></pre><pre>>IP tiebreaker may not be used to *form* a quorum where one did not >exist<o:p></o:p></pre><pre>>previously.<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>>So, if one node of a two node cluster is online, it will never become<o:p></o:p></pre><pre>>quorate unless the other node comes online (or administrator override,<o:p></o:p></pre><pre>>see man pages for "cluforce" and "cludb").<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>>So, in a 2 node cluster, if one node fails and the other node is >online<o:p></o:p></pre><pre>>(and the tiebreaker is still "alive" according to that node), the<o:p></o:p></pre><pre>>remaining node considers itself quorate and "shoots" (aka STONITHs, >aka<o:p></o:p></pre><pre>>fences) the dead node and takes over services.</pre><pre><o:p> </o:p></pre><pre>>If a network partition occurs such that both nodes see the tiebreaker<o:p></o:p></pre><pre>>but not each other, the first one to fence the other will naturally >win.<o:p></o:p></pre><pre><o:p> </o:p></pre><pre><o:p> </o:p></pre><pre>>Ok, moving on...<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>>The disk tiebreaker works in a similar way, except that it lets the<o:p></o:p></pre><pre>>cluster limp in along in a safe, semi-split-brain (split brain) in a<o:p></o:p></pre><pre>>network outage. What I mean is that because there's state information<o:p></o:p></pre><pre>>written to/read from the shared raw partitions, the nodes can actually<o:p></o:p></pre><pre>>tell via other means whether or not the other node is "alive" or not >as<o:p></o:p></pre><pre>>opposed to relying solely on the network traffic.<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>>Both nodes update state information on the shared partitions. When >one<o:p></o:p></pre><pre>>node detects that the other node has not updated its information for a<o:p></o:p></pre><pre>>period of time, that node is "down" according to the disk subsystem. >If<o:p></o:p></pre><pre>>this coincides with a "down" status from the membership daemon, the >node<o:p></o:p></pre><pre>>is fenced and services are failed over. If the node never goes down<o:p></o:p></pre><pre>>(and keeps updating its information on the shared partitions), then >the</pre><pre>I do not use a IP tiebreaker. I have a two nodes system. When the active node shows it is down via memebership but up via disk then</pre><pre>Clumanager determines it is in an “uncertain state” and shoots it. <o:p></o:p></pre><pre><o:p> </o:p></pre><pre><o:p> </o:p></pre><pre>>node is never fenced and services never fail over.<o:p></o:p></pre><pre><o:p> </o:p></pre><pre>-- Lon<o:p></o:p></pre> <div class="MsoNormal"><o:p> </o:p></div> <hr size=1>Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. <a href="http://us.rd.yahoo.com/mail_us/taglines/postman7/*http://us.rd.yahoo.com/evt=39666/*http://messenger.yahoo.com"> Great rates starting at 1¢/min.