From kitgerrits at gmail.com  Fri Dec 19 19:59:34 2008
From: kitgerrits at gmail.com (Kit Gerrits)
Date: Fri, 19 Dec 2008 20:59:34 +0100
Subject: Strange cluster switches with Pulse
In-Reply-To: <352eb1a50812180546q62652d1qcbccbb1d95810f3b@mail.gmail.com>
References: <352eb1a50812180546q62652d1qcbccbb1d95810f3b@mail.gmail.com>
Message-ID: <352eb1a50812191159p4e8139fey43efd80c56ef2664@mail.gmail.com>

Never mind.

I was running the entire cluster in VMware on my laptop.
The machines managed to drift 25 virtual seconds in just 30 real seconds...

People running into this should look at:
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.virtualised_realservers.html

Piranha/Pulse simply does not play nice with VMware.


Regards,

Kit


On Thu, Dec 18, 2008 at 2:46 PM, Kit Gerrits <kitgerrits at gmail.com> wrote:

> I am seeing the strangest thing:
> Every 3 minutes or so, the secondary cluster node thinks it can't see the
> primary cluster node and decides to failover.
> When the primary node sees this, it sends out an ARP the secondary node
> backs off and teh primary takes over again.
>
> Due to local policy, I am building this LVS cluster on RHEL5.2 with
> piranha and ipvsadm from CentOS5.2
>
> Aside from a stock RHEL5.2 from DVD, I am running the following software:
> gmp-4.1.4-10.el5.i386.rpm
> ipvsadm-1.24-8.1.i386.rpm
> libgomp-4.1.2-42.el5.i386.rpm
> php-5.1.6-20.el5.i386.rpm
> php-cli-5.1.6-20.el5.i386.rpm
> php-common-5.1.6-20.el5.i386.rpm
> piranha-0.8.4-9.3.el5.i386.rpm
>
> LVS Config:
> [@lvs-test1 ~]$ cat /etc/sysconfig/ha/lvs.cf
> serial_no = 57
> primary = 10.100.77.4
> primary_private = 192.168.201.11
> service = lvs
> backup_active = 1
> backup = 10.100.76.87
> backup_private = 192.168.201.12
> heartbeat = 1
> heartbeat_port = 539
> keepalive = 2
> deadtime = 10
> network = nat
> nat_router = 192.168.201.15 eth0:1
> nat_nmask = 255.255.255.0
> debug_level = NONE
> monitor_links = 1
> virtual Trac {
>      active = 1
>      address = 10.100.77.250 eth1:1
>      vip_nmask = 255.255.240.0
>      port = 80
>      persistent = 300
>      send = "GET / HTTP/1.0\r\n\r\n"
>      expect = "HTTP"
>      use_regex = 0
>      load_monitor = none
>      scheduler = wlc
>      protocol = tcp
>      timeout = 6
>      reentry = 15
>      quiesce_server = 1
>      server Trac-test1 {
>          address = 192.168.201.21
>          active = 1
>          weight = 500
>      }
>      server Trac-Test2 {
>          address = 192.168.201.22
>          active = 1
>          weight = 500
>      }
> }
>
>
> hosts:
> # Public IPs
> 10.100.77.4     lvs-test1-pub.rdc.local  lvs-test1-pub
> 10.100.76.87    lvs-test2-pub.rdc.local  lvs-test2-pub
> 10.100.77.250  trac-test-pub.rdc.local trac-test-pub.rdc
>
> # Private IPs
> 192.168.201.11  lvs-test1.rdc.local     lvs-test1
> 192.168.201.12  lvs-test2.rdc.local     lvs-test2
> 192.168.201.15  lvs-test-gw.rdc.local   lvs-test-gw
> 192.168.201.21  trac-test1.rdc.local    trac-test1
> 192.168.201.22  trac-test2.rdc.local    trac-test2
>
> Interfaces:
> lvs-test1:
> [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
> eth0      Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:05
>           inet addr:192.168.201.11  Bcast:192.168.201.255
> Mask:255.255.255.0
> eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:05
>           inet addr:192.168.201.15  Bcast:192.168.201.255
> Mask:255.255.255.0
> eth1      Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:0F
>           inet addr:10.100.77.4  Bcast:10.100.79.255  Mask:255.255.240.0
> eth1:1    Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:0F
>           inet addr:10.100.77.250  Bcast:10.100.79.255  Mask:255.255.240.0
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>
> lvs-test2:
> [@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
> eth0      Link encap:Ethernet  HWaddr 00:0C:29:D7:F9:6A
>           inet addr:192.168.201.12  Bcast:192.168.201.255
> Mask:255.255.255.0
>           inet6 addr: fe80::20c:29ff:fed7:f96a/64 Scope:Link
> eth1      Link encap:Ethernet  HWaddr 00:0C:29:D7:F9:74
>           inet addr:10.100.76.87  Bcast:10.100.79.255  Mask:255.255.240.0
>           inet6 addr: fe80::20c:29ff:fed7:f974/64 Scope:Link
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>
>
> Routing table:
> [@lvs-test1 ~]$ netstat -rn
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt
> Iface
> 192.168.201.0   0.0.0.0         255.255.255.0   U         0 0          0
> eth0
> 10.100.64.0     0.0.0.0         255.255.240.0   U         0 0          0
> eth1
> 169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0
> eth1
> 0.0.0.0         10.100.64.254   0.0.0.0         UG        0 0          0
> eth1
>
> [@lvs-test2 ~]$ netstat -rn
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt
> Iface
> 192.168.201.0   0.0.0.0         255.255.255.0   U         0 0          0
> eth0
> 10.100.64.0     0.0.0.0         255.255.240.0   U         0 0          0
> eth1
> 169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0
> eth1
> 0.0.0.0         10.100.64.254   0.0.0.0         UG        0 0          0
> eth1
>
>
>
> ARP ping to floating GW IP and APP IP  works from the non-active node:
> Cluster on lvs-test1:
> [@lvs-test2 ~]$ sudo /sbin/arping 192.168.201.15
> ARPING 192.168.201.15 from 192.168.201.12 eth0
> Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  3.739ms
> Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  1.324ms
> Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  1.332ms
>
> [@lvs-test2 ~]$ sudo /sbin/arping 10.100.77.250
> ARPING 10.100.77.250 from 192.168.201.12 eth0
> Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.895ms
> Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.441ms
> Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.395ms
>
>
> Cluster on lvs-test2:
> [@lvs-test1 etc]$ sudo /sbin/arping 192.168.201.15
> ARPING 192.168.201.15 from 192.168.201.11 eth0
> Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  3.861ms
> Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  1.499ms
> Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  0.934ms
>
> [@lvs-test1 ~]$ sudo /sbin/arping 10.100.77.250
> ARPING 10.100.77.250 from 192.168.201.11 eth0
> Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  4.436ms
> Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  1.327ms
> Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  1.495ms
>
>
> I have attached 2 files with more-detailed information.
>
> # Why does pulse think the secundary is dead, even with 2 interfaces?
> # Why are both internal and  external IP's returning the same MAC address?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081219/693abf0d/attachment.htm>

From kitgerrits at gmail.com  Mon Dec 22 12:09:11 2008
From: kitgerrits at gmail.com (Kit Gerrits)
Date: Mon, 22 Dec 2008 13:09:11 +0100
Subject: Piranha clusters and VMware clock synch -- SOLUTION
Message-ID: <352eb1a50812220409q4933cc7fr9dd4f83db6c512b6@mail.gmail.com>

Hello all,

A few notes w.r.t. section 48.4 of the LVS HOWTO (VMware problems with
NTP/time)
It -IS- possible to keep several hosts in sync time-wise in VMware, even on
a simple laptop.

I am currently running an LVS cluster and 2 webservers on my laptop with
VMWare Server 1.0.7 and 1GB RAM.
There have been no spontaneous cluster failovers (yet) today.


For GSX / VMware Server:
Pass the following options to the kernel via grub.conf:
  clock=pit nosmp noapic nolapic

For RHEL5/CentOS5, you should use the following:
  clocksource=pit nosmp noapic nolapic

Because this will disable all of Linux' intelligent clock tricks, you'll
need to run NTPd to keep your clock in sync.
More info at:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1420


For ESX servers,  the solution is 'simpler':
You need to lower the minimum time between clok requests:
  Configuration --> Software --> Advanced Settings --> Misc -->
Misc.TimerMinHardPeriod
  Lower this value (it is in microseconds)
More info at:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2219
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081222/5fb430c0/attachment.htm>

From kitgerrits at gmail.com  Thu Dec 18 13:46:44 2008
From: kitgerrits at gmail.com (Kit Gerrits)
Date: Thu, 18 Dec 2008 13:46:44 -0000
Subject: Strange cluster switches with Pulse
Message-ID: <352eb1a50812180546q62652d1qcbccbb1d95810f3b@mail.gmail.com>

I am seeing the strangest thing:
Every 3 minutes or so, the secondary cluster node thinks it can't see the
primary cluster node and decides to failover.
When the primary node sees this, it sends out an ARP the secondary node
backs off and teh primary takes over again.

Due to local policy, I am building this LVS cluster on RHEL5.2 with
piranhaand ipvsadm from CentOS5.2

Aside from a stock RHEL5.2 from DVD, I am running the following software:
gmp-4.1.4-10.el5.i386.rpm
ipvsadm-1.24-8.1.i386.rpm
libgomp-4.1.2-42.el5.i386.rpm
php-5.1.6-20.el5.i386.rpm
php-cli-5.1.6-20.el5.i386.rpm
php-common-5.1.6-20.el5.i386.rpm
piranha-0.8.4-9.3.el5.i386.rpm

LVS Config:
[@lvs-test1 ~]$ cat /etc/sysconfig/ha/lvs.cf
serial_no = 57
primary = 10.100.77.4
primary_private = 192.168.201.11
service = lvs
backup_active = 1
backup = 10.100.76.87
backup_private = 192.168.201.12
heartbeat = 1
heartbeat_port = 539
keepalive = 2
deadtime = 10
network = nat
nat_router = 192.168.201.15 eth0:1
nat_nmask = 255.255.255.0
debug_level = NONE
monitor_links = 1
virtual Trac {
     active = 1
     address = 10.100.77.250 eth1:1
     vip_nmask = 255.255.240.0
     port = 80
     persistent = 300
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     use_regex = 0
     load_monitor = none
     scheduler = wlc
     protocol = tcp
     timeout = 6
     reentry = 15
     quiesce_server = 1
     server Trac-test1 {
         address = 192.168.201.21
         active = 1
         weight = 500
     }
     server Trac-Test2 {
         address = 192.168.201.22
         active = 1
         weight = 500
     }
}


hosts:
# Public IPs
10.100.77.4     lvs-test1-pub.rdc.local  lvs-test1-pub
10.100.76.87    lvs-test2-pub.rdc.local  lvs-test2-pub
10.100.77.250  trac-test-pub.rdc.local trac-test-pub.rdc

# Private IPs
192.168.201.11  lvs-test1.rdc.local     lvs-test1
192.168.201.12  lvs-test2.rdc.local     lvs-test2
192.168.201.15  lvs-test-gw.rdc.local   lvs-test-gw
192.168.201.21  trac-test1.rdc.local    trac-test1
192.168.201.22  trac-test2.rdc.local    trac-test2

Interfaces:
lvs-test1:
[@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
eth0      Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:05
          inet addr:192.168.201.11  Bcast:192.168.201.255
Mask:255.255.255.0
eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:05
          inet addr:192.168.201.15  Bcast:192.168.201.255
Mask:255.255.255.0
eth1      Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:0F
          inet addr:10.100.77.4  Bcast:10.100.79.255  Mask:255.255.240.0
eth1:1    Link encap:Ethernet  HWaddr 00:0C:29:C7:5D:0F
          inet addr:10.100.77.250  Bcast:10.100.79.255  Mask:255.255.240.0
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0

lvs-test2:
[@lvs-test1 ~]$ /sbin/ifconfig |grep -e Link -e 'inet' |grep -v inet6
eth0      Link encap:Ethernet  HWaddr 00:0C:29:D7:F9:6A
          inet addr:192.168.201.12  Bcast:192.168.201.255
Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fed7:f96a/64 Scope:Link
eth1      Link encap:Ethernet  HWaddr 00:0C:29:D7:F9:74
          inet addr:10.100.76.87  Bcast:10.100.79.255  Mask:255.255.240.0
          inet6 addr: fe80::20c:29ff:fed7:f974/64 Scope:Link
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0


Routing table:
[@lvs-test1 ~]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt
Iface
192.168.201.0   0.0.0.0         255.255.255.0   U         0 0          0
eth0
10.100.64.0     0.0.0.0         255.255.240.0   U         0 0          0
eth1
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0
eth1
0.0.0.0         10.100.64.254   0.0.0.0         UG        0 0          0
eth1

[@lvs-test2 ~]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt
Iface
192.168.201.0   0.0.0.0         255.255.255.0   U         0 0          0
eth0
10.100.64.0     0.0.0.0         255.255.240.0   U         0 0          0
eth1
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0
eth1
0.0.0.0         10.100.64.254   0.0.0.0         UG        0 0          0
eth1



ARP ping to floating GW IP and APP IP  works from the non-active node:
Cluster on lvs-test1:
[@lvs-test2 ~]$ sudo /sbin/arping 192.168.201.15
ARPING 192.168.201.15 from 192.168.201.12 eth0
Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  3.739ms
Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  1.324ms
Unicast reply from 192.168.201.15 [00:0C:29:C7:5D:05]  1.332ms

[@lvs-test2 ~]$ sudo /sbin/arping 10.100.77.250
ARPING 10.100.77.250 from 192.168.201.12 eth0
Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.895ms
Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.441ms
Unicast reply from 10.100.77.250 [00:0C:29:C7:5D:05]  1.395ms


Cluster on lvs-test2:
[@lvs-test1 etc]$ sudo /sbin/arping 192.168.201.15
ARPING 192.168.201.15 from 192.168.201.11 eth0
Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  3.861ms
Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  1.499ms
Unicast reply from 192.168.201.15 [00:0C:29:D7:F9:6A]  0.934ms

[@lvs-test1 ~]$ sudo /sbin/arping 10.100.77.250
ARPING 10.100.77.250 from 192.168.201.11 eth0
Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  4.436ms
Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  1.327ms
Unicast reply from 10.100.77.250 [00:0C:29:D7:F9:6A]  1.495ms


I have attached 2 files with more-detailed information.

# Why does pulse think the secundary is dead, even with 2 interfaces?
# Why are both internal and  external IP's returning the same MAC address?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081218/3a501a6c/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lvs-test1.txt
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081218/3a501a6c/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lvs-test2.txt
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20081218/3a501a6c/attachment-0001.txt>