Never mind. I was running the entire cluster in VMware on my laptop. The machines managed to drift 25 virtual seconds in just 30 real seconds... People running into this should look at: <a href="http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.virtualised_realservers.html">http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.virtualised_realservers.html</a> Piranha/Pulse simply does not play nice with VMware. Regards, Kit <div class="gmail_quote">On Thu, Dec 18, 2008 at 2:46 PM, Kit Gerrits <<a href="mailto:kitgerrits@gmail.com">kitgerrits@gmail.com</a>> wrote: <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I am seeing the strangest thing: Every 3 minutes or so, the secondary cluster node thinks it can't see the primary cluster node and decides to failover. When the primary node sees this, it sends out an ARP the secondary node backs off and teh primary takes over again. Due to local policy, I am building this LVS cluster on RHEL5.2 with piranha and ipvsadm from CentOS5.2 Aside from a stock RHEL5.2 from DVD, I am running the following software: gmp-4.1.4-10.el5.i386.rpm ipvsadm-1.24-8.1.i386.rpm libgomp-4.1.2-42.el5.i386.rpm php-5.1.6-20.el5.i386.rpm php-cli-5.1.6-20.el5.i386.rpm php-common-5.1.6-20.el5.i386.rpm <div>piranha-0.8.4-9.3.el5.i386.rpm LVS Config: [@lvs-test1 ~]$ cat /etc/sysconfig/ha/<a href="http://lvs.cf/" target="_blank">lvs.cf</a> serial_no = 57 primary = 10.100.77.4 primary_private = 192.168.201.11 service = lvs backup_active = 1 backup = 10.100.76.87 backup_private = 192.168.201.12 heartbeat = 1 heartbeat_port = 539 keepalive = 2 deadtime = 10 network = nat nat_router = 192.168.201.15 eth0:1 nat_nmask = 255.255.255.0 debug_level = NONE monitor_links = 1 virtual Trac { active = 1 address = 10.100.77.250 eth1:1 vip_nmask = 255.255.240.0 port = 80 persistent = 300 send = "GET / HTTP/1.0\r\n\r\n" expect = "HTTP" use_regex = 0 load_monitor = none scheduler = wlc protocol = tcp timeout = 6 reentry = 15 quiesce_server = 1 server Trac-test1 { address = 192.168.201.21 active = 1 weight = 500 } server Trac-Test2 { address = 192.168.201.22 active = 1 weight = 500 } } hosts: # Public IPs 10.100.77.4 lvs-test1-pub.rdc.local lvs-test1-pub 10.100.76.87 lvs-test2-pub.rdc.local lvs-test2-pub 10.100.77.250 trac-test-pub.rdc.local trac-test-pub.rdc # Private IPs 192.168.201.11 lvs-test1.rdc.local lvs-test1 192.168.201.12 lvs-test2.rdc.local lvs-test2 192.168.201.15 lvs-test-gw.rdc.local lvs-test-gw 192.168.201.21 trac-test1.rdc.local trac-test1 192.168.201.22 trac-test2.rdc.local trac-test2 Interfaces: lvs-test1: [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6 eth0 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:05 inet addr:192.168.201.11 Bcast:192.168.201.255 Mask:255.255.255.0 eth0:1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:05 inet addr:192.168.201.15 Bcast:192.168.201.255 Mask:255.255.255.0 eth1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:0F inet addr:10.100.77.4 Bcast:10.100.79.255 Mask:255.255.240.0 eth1:1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:0F inet addr:10.100.77.250 Bcast:10.100.79.255 Mask:255.255.240.0 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 lvs-test2: [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6 eth0 Link encap:Ethernet HWaddr 00:0C:29:D7:F9:6A inet addr:192.168.201.12 Bcast:192.168.201.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fed7:f96a/64 Scope:Link eth1 Link encap:Ethernet HWaddr 00:0C:29:D7:F9:74 inet addr:10.100.76.87 Bcast:10.100.79.255 Mask:255.255.240.0 inet6 addr: fe80::20c:29ff:fed7:f974/64 Scope:Link lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 Routing table: [@lvs-test1 ~]$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.201.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 10.100.64.0 0.0.0.0 255.255.240.0 U 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1 0.0.0.0 10.100.64.254 0.0.0.0 UG 0 0 0 eth1 [@lvs-test2 ~]$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.201.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 10.100.64.0 0.0.0.0 255.255.240.0 U 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1 0.0.0.0 10.100.64.254 0.0.0.0 UG 0 0 0 eth1 ARP ping to floating GW IP and APP IP works from the non-active node: Cluster on lvs-test1: [@lvs-test2 ~]$ sudo /sbin/arping 192.168.201.15 ARPING 192.168.201.15 from 192.168.201.12 eth0 Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 3.739ms Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 1.324ms Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 1.332ms [@lvs-test2 ~]$ sudo /sbin/arping 10.100.77.250 ARPING 10.100.77.250 from 192.168.201.12 eth0 Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.895ms Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.441ms Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.395ms Cluster on lvs-test2: [@lvs-test1 etc]$ sudo /sbin/arping 192.168.201.15 ARPING 192.168.201.15 from 192.168.201.11 eth0 Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 3.861ms Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 1.499ms Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 0.934ms [@lvs-test1 ~]$ sudo /sbin/arping 10.100.77.250 ARPING 10.100.77.250 from 192.168.201.11 eth0 Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 4.436ms Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 1.327ms Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 1.495ms I have attached 2 files with more-detailed information. # Why does pulse think the secundary is dead, even with 2 interfaces? # Why are both internal and external IP's returning the same MAC address?</div> </blockquote></div>