Strange cluster switches with Pulse

Fri Dec 19 19:59:34 UTC 2008

Never mind.

I was running the entire cluster in VMware on my laptop.
The machines managed to drift 25 virtual seconds in just 30 real seconds...

People running into this should look at:
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.virtualised_realservers.html

Piranha/Pulse simply does not play nice with VMware.

Regards,

Kit

On Thu, Dec 18, 2008 at 2:46 PM, Kit Gerrits <kitgerrits at gmail.com> wrote:

> I am seeing the strangest thing:
> Every 3 minutes or so, the secondary cluster node thinks it can't see the
> primary cluster node and decides to failover.
> When the primary node sees this, it sends out an ARP the secondary node
> backs off and teh primary takes over again.
>
> Due to local policy, I am building this LVS cluster on RHEL5.2 with
> piranha and ipvsadm from CentOS5.2
>
> Aside from a stock RHEL5.2 from DVD, I am running the following software:
> gmp-4.1.4-10.el5.i386.rpm
> ipvsadm-1.24-8.1.i386.rpm
> libgomp-4.1.2-42.el5.i386.rpm
> php-5.1.6-20.el5.i386.rpm
> php-cli-5.1.6-20.el5.i386.rpm
> php-common-5.1.6-20.el5.i386.rpm
> piranha-0.8.4-9.3.el5.i386.rpm
>
> LVS Config:
> [@lvs-test1 ~]$ cat /etc/sysconfig/ha/lvs.cf
> serial_no = 57
> primary = 10.100.77.4
> primary_private = 192.168.201.11
> service = lvs
> backup_active = 1
> backup = 10.100.76.87
> backup_private = 192.168.201.12
> heartbeat = 1
> heartbeat_port = 539
> keepalive = 2
> deadtime = 10
> network = nat
> nat_router = 192.168.201.15 eth0:1
> nat_nmask = 255.255.255.0
> debug_level = NONE
> monitor_links = 1
> virtual Trac {
>      active = 1
>      address = 10.100.77.250 eth1:1
>      vip_nmask = 255.255.240.0
>      port = 80
>      persistent = 300
>      send = "GET / HTTP/1.0\r\n\r\n"
>      expect = "HTTP"
>      use_regex = 0
>      load_monitor = none
>      scheduler = wlc
>      protocol = tcp
>      timeout = 6
>      reentry = 15
>      quiesce_server = 1
>      server Trac-test1 {
>          address = 192.168.201.21
>          active = 1
>          weight = 500
>      }
>      server Trac-Test2 {
>          address = 192.168.201.22
>          active = 1
>          weight = 500
>      }
> }
>
>
> hosts:
> # Public IPs
> 10.100.77.4     lvs-test1-pub.rdc.local  lvs-test1-pub
> 10.100.76.87    lvs-test2-pub.rdc.local  lvs-test2-pub
> 10.100.77.250  trac-test-pub.rdc.local trac-test-pub.rdc
>
> # Private IPs
> 192.168.201.11  lvs-test1.rdc.local     lvs-test1
> 192.168.201.12  lvs-test2.rdc.local     lvs-test2
> 192.168.201.15  lvs-test-gw.rdc.local   lvs-test-gw
> 192.168.201.21  trac-test1.rdc.local    trac-test1
> 192.168.201.22  trac-test2.rdc.local    trac-test2
>
> Interfaces:
> lvs-test1:
> [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
> eth0      Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:05
>           inet addr:192.168.201.11  Bcast:192.168.201.255
> Mask:255.255.255.0
> eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:05
>           inet addr:192.168.201.15  Bcast:192.168.201.255
> Mask:255.255.255.0
> eth1      Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:0F
>           inet addr:10.100.77.4  Bcast:10.100.79.255  Mask:255.255.240.0
> eth1:1    Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:0F
>           inet addr:10.100.77.250  Bcast:10.100.79.255  Mask:255.255.240.0
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>
> lvs-test2:
> [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
> eth0      Link encap:Ethernet  HWaddr 00:0C:29:D7:F9:6A
>           inet addr:192.168.201.12  Bcast:192.168.201.255
> Mask:255.255.255.0
>           inet6 addr: fe80::20c:29ff:fed7:f96a/64 Scope:Link
> eth1      Link encap:Ethernet  HWaddr 00:0C:29:D7:F9:74
>           inet addr:10.100.76.87  Bcast:10.100.79.255  Mask:255.255.240.0
>           inet6 addr: fe80::20c:29ff:fed7:f974/64 Scope:Link
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>
>
> Routing table:
> [@lvs-test1 ~]$ netstat -rn
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt
> Iface
> 192.168.201.0   0.0.0.0         255.255.255.0   U         0 0          0
> eth0
> 10.100.64.0     0.0.0.0         255.255.240.0   U         0 0          0
> eth1
> 169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0
> eth1
> 0.0.0.0         10.100.64.254   0.0.0.0         UG        0 0          0
> eth1
>
> [@lvs-test2 ~]$ netstat -rn
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt
> Iface
> 192.168.201.0   0.0.0.0         255.255.255.0   U         0 0          0
> eth0
> 10.100.64.0     0.0.0.0         255.255.240.0   U         0 0          0
> eth1
> 169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0
> eth1
> 0.0.0.0         10.100.64.254   0.0.0.0         UG        0 0          0
> eth1
>
>
>
> ARP ping to floating GW IP and APP IP  works from the non-active node:
> Cluster on lvs-test1:
> [@lvs-test2 ~]$ sudo /sbin/arping 192.168.201.15
> ARPING 192.168.201.15 from 192.168.201.12 eth0
> Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  3.739ms
> Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  1.324ms
> Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  1.332ms
>
> [@lvs-test2 ~]$ sudo /sbin/arping 10.100.77.250
> ARPING 10.100.77.250 from 192.168.201.12 eth0
> Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.895ms
> Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.441ms
> Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.395ms
>
>
> Cluster on lvs-test2:
> [@lvs-test1 etc]$ sudo /sbin/arping 192.168.201.15
> ARPING 192.168.201.15 from 192.168.201.11 eth0
> Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  3.861ms
> Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  1.499ms
> Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  0.934ms
>
> [@lvs-test1 ~]$ sudo /sbin/arping 10.100.77.250
> ARPING 10.100.77.250 from 192.168.201.11 eth0
> Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  4.436ms
> Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  1.327ms
> Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  1.495ms
>
>
> I have attached 2 files with more-detailed information.
>
> # Why does pulse think the secundary is dead, even with 2 interfaces?
> # Why are both internal and  external IP's returning the same MAC address?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081219/693abf0d/attachment.htm>