Strange cluster switches with Pulse
Kit Gerrits
kitgerrits at gmail.com
Thu Dec 18 13:46:44 UTC 2008
I am seeing the strangest thing:
Every 3 minutes or so, the secondary cluster node thinks it can't see the
primary cluster node and decides to failover.
When the primary node sees this, it sends out an ARP the secondary node
backs off and teh primary takes over again.
Due to local policy, I am building this LVS cluster on RHEL5.2 with
piranhaand ipvsadm from CentOS5.2
Aside from a stock RHEL5.2 from DVD, I am running the following software:
gmp-4.1.4-10.el5.i386.rpm
ipvsadm-1.24-8.1.i386.rpm
libgomp-4.1.2-42.el5.i386.rpm
php-5.1.6-20.el5.i386.rpm
php-cli-5.1.6-20.el5.i386.rpm
php-common-5.1.6-20.el5.i386.rpm
piranha-0.8.4-9.3.el5.i386.rpm
LVS Config:
[@lvs-test1 ~]$ cat /etc/sysconfig/ha/lvs.cf
serial_no = 57
primary = 10.100.77.4
primary_private = 192.168.201.11
service = lvs
backup_active = 1
backup = 10.100.76.87
backup_private = 192.168.201.12
heartbeat = 1
heartbeat_port = 539
keepalive = 2
deadtime = 10
network = nat
nat_router = 192.168.201.15 eth0:1
nat_nmask = 255.255.255.0
debug_level = NONE
monitor_links = 1
virtual Trac {
active = 1
address = 10.100.77.250 eth1:1
vip_nmask = 255.255.240.0
port = 80
persistent = 300
send = "GET / HTTP/1.0\r\n\r\n"
expect = "HTTP"
use_regex = 0
load_monitor = none
scheduler = wlc
protocol = tcp
timeout = 6
reentry = 15
quiesce_server = 1
server Trac-test1 {
address = 192.168.201.21
active = 1
weight = 500
}
server Trac-Test2 {
address = 192.168.201.22
active = 1
weight = 500
}
}
hosts:
# Public IPs
10.100.77.4 lvs-test1-pub.rdc.local lvs-test1-pub
10.100.76.87 lvs-test2-pub.rdc.local lvs-test2-pub
10.100.77.250 trac-test-pub.rdc.local trac-test-pub.rdc
# Private IPs
192.168.201.11 lvs-test1.rdc.local lvs-test1
192.168.201.12 lvs-test2.rdc.local lvs-test2
192.168.201.15 lvs-test-gw.rdc.local lvs-test-gw
192.168.201.21 trac-test1.rdc.local trac-test1
192.168.201.22 trac-test2.rdc.local trac-test2
Interfaces:
lvs-test1:
[@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
eth0 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:05
inet addr:192.168.201.11 Bcast:192.168.201.255
Mask:255.255.255.0
eth0:1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:05
inet addr:192.168.201.15 Bcast:192.168.201.255
Mask:255.255.255.0
eth1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:0F
inet addr:10.100.77.4 Bcast:10.100.79.255 Mask:255.255.240.0
eth1:1 Link encap:Ethernet HWaddr 00:0C:29:C7:5D:0F
inet addr:10.100.77.250 Bcast:10.100.79.255 Mask:255.255.240.0
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
lvs-test2:
[@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
eth0 Link encap:Ethernet HWaddr 00:0C:29:D7:F9:6A
inet addr:192.168.201.12 Bcast:192.168.201.255
Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fed7:f96a/64 Scope:Link
eth1 Link encap:Ethernet HWaddr 00:0C:29:D7:F9:74
inet addr:10.100.76.87 Bcast:10.100.79.255 Mask:255.255.240.0
inet6 addr: fe80::20c:29ff:fed7:f974/64 Scope:Link
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
Routing table:
[@lvs-test1 ~]$ netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt
Iface
192.168.201.0 0.0.0.0 255.255.255.0 U 0 0 0
eth0
10.100.64.0 0.0.0.0 255.255.240.0 U 0 0 0
eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0
eth1
0.0.0.0 10.100.64.254 0.0.0.0 UG 0 0 0
eth1
[@lvs-test2 ~]$ netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt
Iface
192.168.201.0 0.0.0.0 255.255.255.0 U 0 0 0
eth0
10.100.64.0 0.0.0.0 255.255.240.0 U 0 0 0
eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0
eth1
0.0.0.0 10.100.64.254 0.0.0.0 UG 0 0 0
eth1
ARP ping to floating GW IP and APP IP works from the non-active node:
Cluster on lvs-test1:
[@lvs-test2 ~]$ sudo /sbin/arping 192.168.201.15
ARPING 192.168.201.15 from 192.168.201.12 eth0
Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 3.739ms
Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 1.324ms
Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05] 1.332ms
[@lvs-test2 ~]$ sudo /sbin/arping 10.100.77.250
ARPING 10.100.77.250 from 192.168.201.12 eth0
Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.895ms
Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.441ms
Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05] 1.395ms
Cluster on lvs-test2:
[@lvs-test1 etc]$ sudo /sbin/arping 192.168.201.15
ARPING 192.168.201.15 from 192.168.201.11 eth0
Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 3.861ms
Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 1.499ms
Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A] 0.934ms
[@lvs-test1 ~]$ sudo /sbin/arping 10.100.77.250
ARPING 10.100.77.250 from 192.168.201.11 eth0
Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 4.436ms
Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 1.327ms
Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A] 1.495ms
I have attached 2 files with more-detailed information.
# Why does pulse think the secundary is dead, even with 2 interfaces?
# Why are both internal and external IP's returning the same MAC address?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081218/3a501a6c/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lvs-test1.txt
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081218/3a501a6c/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lvs-test2.txt
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081218/3a501a6c/attachment-0001.txt>
More information about the Piranha-list
mailing list