From kitgerrits at gmail.com Fri Dec 19 19:59:34 2008 From: kitgerrits at gmail.com (Kit Gerrits) Date: Fri, 19 Dec 2008 20:59:34 +0100 Subject: Strange cluster switches with Pulse In-Reply-To: <352eb1a50812180546q62652d1qcbccbb1d95810f3b@mail.gmail.com> References: <352eb1a50812180546q62652d1qcbccbb1d95810f3b@mail.gmail.com> Message-ID: <352eb1a50812191159p4e8139fey43efd80c56ef2664@mail.gmail.com> Never mind. I was running the entire cluster in VMware on my laptop. The machines managed to drift 25 virtual seconds in just 30 real seconds... People running into this should look at: http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.virtualised_realservers.html Piranha/Pulse simply does not play nice with VMware. Regards, Kit On Thu, Dec 18, 2008 at 2:46 PM, Kit Gerrits wrote: > I am seeing the strangest thing: > Every 3 minutes or so, the secondary cluster node thinks it can't see the > primary cluster node and decides to failover. > When the primary node sees this, it sends out an ARP the secondary node > backs off and teh primary takes over again. > > Due to local policy, I am building this LVS cluster on RHEL5.2 with > piranha and ipvsadm from CentOS5.2 > > Aside from a stock RHEL5.2 from DVD, I am running the following software: > gmp-4.1.4-10.el5.i386.rpm > ipvsadm-1.24-8.1.i386.rpm > libgomp-4.1.2-42.el5.i386.rpm > php-5.1.6-20.el5.i386.rpm > php-cli-5.1.6-20.el5.i386.rpm > php-common-5.1.6-20.el5.i386.rpm > piranha-0.8.4-9.3.el5.i386.rpm > > LVS Config: > [@lvs-test1 ~]$ cat /etc/sysconfig/ha/lvs.cf > serial_no = 57 > primary = 10.100.77.4 > primary_private = 192.168.201.11 > service = lvs > backup_active = 1 > backup = 10.100.76.87 > backup_private = 192.168.201.12 > heartbeat = 1 > heartbeat_port = 539 > keepalive = 2 > deadtime = 10 > network = nat > nat_router = 192.168.201.15 eth0:1 > nat_nmask = 255.255.255.0 > debug_level = NONE > monitor_links = 1 > virtual Trac { > active = 1 > address = 10.100.77.250 eth1:1 > vip_nmask = 255.255.240.0 > port = 80 > persistent = 300 > send = "GET / HTTP/1.0\r\n\r\n" > expect = "HTTP" > use_regex = 0 > load_monitor = none > scheduler = wlc > protocol = tcp > timeout = 6 > reentry = 15 > quiesce_server = 1 > server Trac-test1 { > address = 192.168.201.21 > active = 1 > weight = 500 > } > server Trac-Test2 { > address = 192.168.201.22 > active = 1 > weight = 500 > } > } > > > hosts: > # Public IPs > 10.100.77.4 lvs-test1-pub.rdc.local lvs-test1-pub > 10.100.76.87 lvs-test2-pub.rdc.local lvs-test2-pub > 10.100.77.250 trac-test-pub.rdc.local trac-test-pub.rdc > > # Private IPs > 192.168.201.11 lvs-test1.rdc.local lvs-test1 > 192.168.201.12 lvs-test2.rdc.local lvs-test2 > 192.168.201.15 lvs-test-gw.rdc.local lvs-test-gw > 192.168.201.21 trac-test1.rdc.local trac-test1 > 192.168.201.22 trac-test2.rdc.local trac-test2 > > Interfaces: > lvs-test1: > [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6 > eth0 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:05 > inet addr:192.168.201.11 Bcast:192.168.201.255 > Mask:255.255.255.0 > eth0:1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:05 > inet addr:192.168.201.15 Bcast:192.168.201.255 > Mask:255.255.255.0 > eth1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:0F > inet addr:10.100.77.4 Bcast:10.100.79.255 Mask:255.255.240.0 > eth1:1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:0F > inet addr:10.100.77.250 Bcast:10.100.79.255 Mask:255.255.240.0 > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > > lvs-test2: > [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6 > eth0 Link encap:Ethernet HWaddr 00:0C:29:D7:F9:6A > inet addr:192.168.201.12 Bcast:192.168.201.255 > Mask:255.255.255.0 > inet6 addr: fe80::20c:29ff:fed7:f96a/64 Scope:Link > eth1 Link encap:Ethernet HWaddr 00:0C:29:D7:F9:74 > inet addr:10.100.76.87 Bcast:10.100.79.255 Mask:255.255.240.0 > inet6 addr: fe80::20c:29ff:fed7:f974/64 Scope:Link > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > > > Routing table: > [@lvs-test1 ~]$ netstat -rn > Kernel IP routing table > Destination Gateway Genmask Flags MSS Window irtt > Iface > 192.168.201.0 0.0.0.0 255.255.255.0 U 0 0 0 > eth0 > 10.100.64.0 0.0.0.0 255.255.240.0 U 0 0 0 > eth1 > 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 > eth1 > 0.0.0.0 10.100.64.254 0.0.0.0 UG 0 0 0 > eth1 > > [@lvs-test2 ~]$ netstat -rn > Kernel IP routing table > Destination Gateway Genmask Flags MSS Window irtt > Iface > 192.168.201.0 0.0.0.0 255.255.255.0 U 0 0 0 > eth0 > 10.100.64.0 0.0.0.0 255.255.240.0 U 0 0 0 > eth1 > 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 > eth1 > 0.0.0.0 10.100.64.254 0.0.0.0 UG 0 0 0 > eth1 > > > > ARP ping to floating GW IP and APP IP works from the non-active node: > Cluster on lvs-test1: > [@lvs-test2 ~]$ sudo /sbin/arping 192.168.201.15 > ARPING 192.168.201.15 from 192.168.201.12 eth0 > Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 3.739ms > Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 1.324ms > Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 1.332ms > > [@lvs-test2 ~]$ sudo /sbin/arping 10.100.77.250 > ARPING 10.100.77.250 from 192.168.201.12 eth0 > Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.895ms > Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.441ms > Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.395ms > > > Cluster on lvs-test2: > [@lvs-test1 etc]$ sudo /sbin/arping 192.168.201.15 > ARPING 192.168.201.15 from 192.168.201.11 eth0 > Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 3.861ms > Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 1.499ms > Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 0.934ms > > [@lvs-test1 ~]$ sudo /sbin/arping 10.100.77.250 > ARPING 10.100.77.250 from 192.168.201.11 eth0 > Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 4.436ms > Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 1.327ms > Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 1.495ms > > > I have attached 2 files with more-detailed information. > > # Why does pulse think the secundary is dead, even with 2 interfaces? > # Why are both internal and external IP's returning the same MAC address? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kitgerrits at gmail.com Mon Dec 22 12:09:11 2008 From: kitgerrits at gmail.com (Kit Gerrits) Date: Mon, 22 Dec 2008 13:09:11 +0100 Subject: Piranha clusters and VMware clock synch -- SOLUTION Message-ID: <352eb1a50812220409q4933cc7fr9dd4f83db6c512b6@mail.gmail.com> Hello all, A few notes w.r.t. section 48.4 of the LVS HOWTO (VMware problems with NTP/time) It -IS- possible to keep several hosts in sync time-wise in VMware, even on a simple laptop. I am currently running an LVS cluster and 2 webservers on my laptop with VMWare Server 1.0.7 and 1GB RAM. There have been no spontaneous cluster failovers (yet) today. For GSX / VMware Server: Pass the following options to the kernel via grub.conf: clock=pit nosmp noapic nolapic For RHEL5/CentOS5, you should use the following: clocksource=pit nosmp noapic nolapic Because this will disable all of Linux' intelligent clock tricks, you'll need to run NTPd to keep your clock in sync. More info at: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1420 For ESX servers, the solution is 'simpler': You need to lower the minimum time between clok requests: Configuration --> Software --> Advanced Settings --> Misc --> Misc.TimerMinHardPeriod Lower this value (it is in microseconds) More info at: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2219 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kitgerrits at gmail.com Thu Dec 18 13:46:44 2008 From: kitgerrits at gmail.com (Kit Gerrits) Date: Thu, 18 Dec 2008 13:46:44 -0000 Subject: Strange cluster switches with Pulse Message-ID: <352eb1a50812180546q62652d1qcbccbb1d95810f3b@mail.gmail.com> I am seeing the strangest thing: Every 3 minutes or so, the secondary cluster node thinks it can't see the primary cluster node and decides to failover. When the primary node sees this, it sends out an ARP the secondary node backs off and teh primary takes over again. Due to local policy, I am building this LVS cluster on RHEL5.2 with piranhaand ipvsadm from CentOS5.2 Aside from a stock RHEL5.2 from DVD, I am running the following software: gmp-4.1.4-10.el5.i386.rpm ipvsadm-1.24-8.1.i386.rpm libgomp-4.1.2-42.el5.i386.rpm php-5.1.6-20.el5.i386.rpm php-cli-5.1.6-20.el5.i386.rpm php-common-5.1.6-20.el5.i386.rpm piranha-0.8.4-9.3.el5.i386.rpm LVS Config: [@lvs-test1 ~]$ cat /etc/sysconfig/ha/lvs.cf serial_no = 57 primary = 10.100.77.4 primary_private = 192.168.201.11 service = lvs backup_active = 1 backup = 10.100.76.87 backup_private = 192.168.201.12 heartbeat = 1 heartbeat_port = 539 keepalive = 2 deadtime = 10 network = nat nat_router = 192.168.201.15 eth0:1 nat_nmask = 255.255.255.0 debug_level = NONE monitor_links = 1 virtual Trac { active = 1 address = 10.100.77.250 eth1:1 vip_nmask = 255.255.240.0 port = 80 persistent = 300 send = "GET / HTTP/1.0\r\n\r\n" expect = "HTTP" use_regex = 0 load_monitor = none scheduler = wlc protocol = tcp timeout = 6 reentry = 15 quiesce_server = 1 server Trac-test1 { address = 192.168.201.21 active = 1 weight = 500 } server Trac-Test2 { address = 192.168.201.22 active = 1 weight = 500 } } hosts: # Public IPs 10.100.77.4 lvs-test1-pub.rdc.local lvs-test1-pub 10.100.76.87 lvs-test2-pub.rdc.local lvs-test2-pub 10.100.77.250 trac-test-pub.rdc.local trac-test-pub.rdc # Private IPs 192.168.201.11 lvs-test1.rdc.local lvs-test1 192.168.201.12 lvs-test2.rdc.local lvs-test2 192.168.201.15 lvs-test-gw.rdc.local lvs-test-gw 192.168.201.21 trac-test1.rdc.local trac-test1 192.168.201.22 trac-test2.rdc.local trac-test2 Interfaces: lvs-test1: [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6 eth0 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:05 inet addr:192.168.201.11 Bcast:192.168.201.255 Mask:255.255.255.0 eth0:1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:05 inet addr:192.168.201.15 Bcast:192.168.201.255 Mask:255.255.255.0 eth1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:0F inet addr:10.100.77.4 Bcast:10.100.79.255 Mask:255.255.240.0 eth1:1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:0F inet addr:10.100.77.250 Bcast:10.100.79.255 Mask:255.255.240.0 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 lvs-test2: [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6 eth0 Link encap:Ethernet HWaddr 00:0C:29:D7:F9:6A inet addr:192.168.201.12 Bcast:192.168.201.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fed7:f96a/64 Scope:Link eth1 Link encap:Ethernet HWaddr 00:0C:29:D7:F9:74 inet addr:10.100.76.87 Bcast:10.100.79.255 Mask:255.255.240.0 inet6 addr: fe80::20c:29ff:fed7:f974/64 Scope:Link lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 Routing table: [@lvs-test1 ~]$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.201.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 10.100.64.0 0.0.0.0 255.255.240.0 U 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1 0.0.0.0 10.100.64.254 0.0.0.0 UG 0 0 0 eth1 [@lvs-test2 ~]$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.201.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 10.100.64.0 0.0.0.0 255.255.240.0 U 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1 0.0.0.0 10.100.64.254 0.0.0.0 UG 0 0 0 eth1 ARP ping to floating GW IP and APP IP works from the non-active node: Cluster on lvs-test1: [@lvs-test2 ~]$ sudo /sbin/arping 192.168.201.15 ARPING 192.168.201.15 from 192.168.201.12 eth0 Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 3.739ms Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 1.324ms Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 1.332ms [@lvs-test2 ~]$ sudo /sbin/arping 10.100.77.250 ARPING 10.100.77.250 from 192.168.201.12 eth0 Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.895ms Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.441ms Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.395ms Cluster on lvs-test2: [@lvs-test1 etc]$ sudo /sbin/arping 192.168.201.15 ARPING 192.168.201.15 from 192.168.201.11 eth0 Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 3.861ms Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 1.499ms Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 0.934ms [@lvs-test1 ~]$ sudo /sbin/arping 10.100.77.250 ARPING 10.100.77.250 from 192.168.201.11 eth0 Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 4.436ms Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 1.327ms Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 1.495ms I have attached 2 files with more-detailed information. # Why does pulse think the secundary is dead, even with 2 interfaces? # Why are both internal and external IP's returning the same MAC address? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lvs-test1.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lvs-test2.txt URL: