Strange cluster switches with Pulse

Thu Dec 18 13:46:44 UTC 2008

I am seeing the strangest thing:
Every 3 minutes or so, the secondary cluster node thinks it can't see the
primary cluster node and decides to failover.
When the primary node sees this, it sends out an ARP the secondary node
backs off and teh primary takes over again.

Due to local policy, I am building this LVS cluster on RHEL5.2 with
piranhaand ipvsadm from CentOS5.2

Aside from a stock RHEL5.2 from DVD, I am running the following software:
gmp-4.1.4-10.el5.i386.rpm
ipvsadm-1.24-8.1.i386.rpm
libgomp-4.1.2-42.el5.i386.rpm
php-5.1.6-20.el5.i386.rpm
php-cli-5.1.6-20.el5.i386.rpm
php-common-5.1.6-20.el5.i386.rpm
piranha-0.8.4-9.3.el5.i386.rpm

LVS Config:
[@lvs-test1 ~]$ cat /etc/sysconfig/ha/lvs.cf
serial_no = 57
primary = 10.100.77.4
primary_private = 192.168.201.11
service = lvs
backup_active = 1
backup = 10.100.76.87
backup_private = 192.168.201.12
heartbeat = 1
heartbeat_port = 539
keepalive = 2
deadtime = 10
network = nat
nat_router = 192.168.201.15 eth0:1
nat_nmask = 255.255.255.0
debug_level = NONE
monitor_links = 1
virtual Trac {
     active = 1
     address = 10.100.77.250 eth1:1
     vip_nmask = 255.255.240.0
     port = 80
     persistent = 300
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     use_regex = 0
     load_monitor = none
     scheduler = wlc
     protocol = tcp
     timeout = 6
     reentry = 15
     quiesce_server = 1
     server Trac-test1 {
         address = 192.168.201.21
         active = 1
         weight = 500
     }
     server Trac-Test2 {
         address = 192.168.201.22
         active = 1
         weight = 500
     }
}

hosts:
# Public IPs
10.100.77.4     lvs-test1-pub.rdc.local  lvs-test1-pub
10.100.76.87    lvs-test2-pub.rdc.local  lvs-test2-pub
10.100.77.250  trac-test-pub.rdc.local trac-test-pub.rdc

# Private IPs
192.168.201.11  lvs-test1.rdc.local     lvs-test1
192.168.201.12  lvs-test2.rdc.local     lvs-test2
192.168.201.15  lvs-test-gw.rdc.local   lvs-test-gw
192.168.201.21  trac-test1.rdc.local    trac-test1
192.168.201.22  trac-test2.rdc.local    trac-test2

Interfaces:
lvs-test1:
[@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
eth0      Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:05
          inet addr:192.168.201.11  Bcast:192.168.201.255
Mask:255.255.255.0
eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:05
          inet addr:192.168.201.15  Bcast:192.168.201.255
Mask:255.255.255.0
eth1      Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:0F
          inet addr:10.100.77.4  Bcast:10.100.79.255  Mask:255.255.240.0
eth1:1    Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:0F
          inet addr:10.100.77.250  Bcast:10.100.79.255  Mask:255.255.240.0
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0

lvs-test2:
[@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
eth0      Link encap:Ethernet  HWaddr 00:0C:29:D7:F9:6A
          inet addr:192.168.201.12  Bcast:192.168.201.255
Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fed7:f96a/64 Scope:Link
eth1      Link encap:Ethernet  HWaddr 00:0C:29:D7:F9:74
          inet addr:10.100.76.87  Bcast:10.100.79.255  Mask:255.255.240.0
          inet6 addr: fe80::20c:29ff:fed7:f974/64 Scope:Link
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0

Routing table:
[@lvs-test1 ~]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt
Iface
192.168.201.0   0.0.0.0         255.255.255.0   U         0 0          0
eth0
10.100.64.0     0.0.0.0         255.255.240.0   U         0 0          0
eth1
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0
eth1
0.0.0.0         10.100.64.254   0.0.0.0         UG        0 0          0
eth1

[@lvs-test2 ~]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt
Iface
192.168.201.0   0.0.0.0         255.255.255.0   U         0 0          0
eth0
10.100.64.0     0.0.0.0         255.255.240.0   U         0 0          0
eth1
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0
eth1
0.0.0.0         10.100.64.254   0.0.0.0         UG        0 0          0
eth1

ARP ping to floating GW IP and APP IP  works from the non-active node:
Cluster on lvs-test1:
[@lvs-test2 ~]$ sudo /sbin/arping 192.168.201.15
ARPING 192.168.201.15 from 192.168.201.12 eth0
Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  3.739ms
Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  1.324ms
Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  1.332ms

[@lvs-test2 ~]$ sudo /sbin/arping 10.100.77.250
ARPING 10.100.77.250 from 192.168.201.12 eth0
Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.895ms
Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.441ms
Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.395ms

Cluster on lvs-test2:
[@lvs-test1 etc]$ sudo /sbin/arping 192.168.201.15
ARPING 192.168.201.15 from 192.168.201.11 eth0
Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  3.861ms
Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  1.499ms
Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  0.934ms

[@lvs-test1 ~]$ sudo /sbin/arping 10.100.77.250
ARPING 10.100.77.250 from 192.168.201.11 eth0
Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  4.436ms
Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  1.327ms
Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  1.495ms

I have attached 2 files with more-detailed information.

# Why does pulse think the secundary is dead, even with 2 interfaces?
# Why are both internal and  external IP's returning the same MAC address?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081218/3a501a6c/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lvs-test1.txt
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081218/3a501a6c/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lvs-test2.txt
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081218/3a501a6c/attachment-0001.txt>