nanny is broken?

Robert Hurst rhurst at bidmc.harvard.edu
Tue Apr 18 14:55:39 UTC 2006


Using RHEL AS 4 Update 3 on three servers with piranha-0.8.2-1, and it
appears that nanny is not holding up its end of the bargain.
1) Piranha
2) Active Caché database server
3) Standby Caché database server

The three servers have a private network link between them, 192.168.0.1
& & 3, respectively.  I use .254 as the Piranha NAT gateway.  The
problem is, my client attempts are passing through piranha and
(alternately) trying to connect to the active and then standby server.
Eh?!?!?

I use GET to monitor the database server's port #1972, such as:
[root at jerrylab1 ~]# GET -ds http://node2:1972
200 OK
[root at jerrylab1 ~]# GET -ds http://node3:1972
500 Can't connect to node3:1972 (connect: No route to host)

Shouldn't nanny at this point declare that server #3 is dead and
instruct ipvsadm to take it out of the active pool?  An excerpt
from /var/log/messages:

Apr 18 09:06:52 jerrylab1 nanny[30557]: starting LVS client monitor for
10.25.65.51:1972
Apr 18 09:06:52 jerrylab1 nanny[30557]: Send_program len=40,
text="/usr/bin/GET
-ds http://192.168.0.2:1972"
Apr 18 09:06:52 jerrylab1 nanny[30557]: Send_string is NULL
Apr 18 09:06:52 jerrylab1 nanny[30557]: Expect_string len=6, text="200
OK"
Apr 18 09:06:52 jerrylab1 nanny[30557]: Service_type value=2
Apr 18 09:06:52 jerrylab1 nanny[30557]: Found 2 argument(s)
Apr 18 09:06:52 jerrylab1 nanny[30557]: Invoking:  /usr/bin/GET
Apr 18 09:06:53 jerrylab1 nanny[30557]: Got result (200 OK) from command
sent to (192.168.0.2)
Apr 18 09:06:53 jerrylab1 nanny[30557]: avail: 1 active: 0: count: 3
Apr 18 09:06:53 jerrylab1 nanny[30557]: making 192.168.0.2:1972
available
Apr 18 09:06:53 jerrylab1 nanny[30557]: /sbin/ipvsadm command failed!
Apr 18 09:07:10 jerrylab1 nanny[30583]: starting LVS client monitor for
10.25.65.51:1972
Apr 18 09:07:10 jerrylab1 nanny[30583]: Send_program len=40,
text="/usr/bin/GET
-ds http://192.168.0.3:1972"
Apr 18 09:07:10 jerrylab1 nanny[30583]: Send_string is NULL
Apr 18 09:07:10 jerrylab1 nanny[30583]: Expect_string len=6, text="200
OK"
Apr 18 09:07:10 jerrylab1 nanny[30583]: Service_type value=2
Apr 18 09:07:10 jerrylab1 nanny[30583]: Found 2 argument(s)
Apr 18 09:07:10 jerrylab1 nanny[30583]: Invoking:  /usr/bin/GET
Apr 18 09:07:11 jerrylab1 nanny[30583]: The following exited abnormally:
Apr 18 09:07:11 jerrylab1 nanny[30583]: Got result ((null)) from command
sent to (192.168.0.3)
Apr 18 09:07:11 jerrylab1 nanny[30583]: Ran the external sending program
to (192.168.0.3) but didn't get anything back
Apr 18 09:07:11 jerrylab1 nanny[30583]: avail: 1 active: 0: count: 3
Apr 18 09:07:11 jerrylab1 nanny[30583]: making 192.168.0.3:1972
available
Apr 18 09:07:11 jerrylab1 nanny[30583]: /sbin/ipvsadm command failed!


[root at jerrylab1 ha]# cat lvs.cf
serial_no = 32
primary = 10.25.65.92
primary_private = 192.168.0.1
service = lvs
backup_active = 0
backup = 10.25.65.53
backup_private = 192.168.0.2
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = nat
nat_router = 192.168.0.254 eth1:1
nat_nmask = 255.255.0.0
debug_level = NONE
monitor_links = 1
virtual ECP_DB {
     active = 1
     address = 10.25.65.51 eth0:1
     vip_nmask = 255.255.255.128
     fwmark = 1972
     port = 1972
     expect = "200 OK"
     use_regex = 0
     send_program = "/usr/bin/GET -ds http://%h:1972"
     load_monitor = none
     scheduler = wlc
     protocol = tcp
     timeout = 6
     reentry = 15
     quiesce_server = 0
     server node2 {
         address = 192.168.0.2
         active = 1
         weight = 1
     }
     server node3 {
         address = 192.168.0.3
         active = 1
         weight = 1
     }
}


Piranha Control/Monitoring windows shows:

CURRENT LVS ROUTING TABLE
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 000007B4 wlc
-> 192.168.0.2:0 Masq 1 0 0
-> 192.168.0.3:0 Masq 1 0 0

CURRENT LVS PROCESSES
root 4106 0.0 0.0 1660 564 ? Ss 10:10 0:00 pulse
root 4122 0.0 0.0 1644 548 ? Ss 10:10 0:00 /usr/sbin/lvsd --nofork
-c /etc/sysconfig/ha/lvs.cf
root 4127 0.0 0.0 1636 528 ? Ss 10:10 0:00 /usr/sbin/nanny -c -h
192.168.0.2 -p 1972 -f 1972 -e /usr/bin/GET -ds http://%h:1972
-x OK -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 10.25.65.51 -M m -U none --lvs
root 4130 0.0 0.0 1640 532 ? Ss 10:10 0:00 /usr/sbin/nanny -c -h
192.168.0.3 -p 1972 -f 1972 -e /usr/bin/GET -ds http://%h:1972
-x OK -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 10.25.65.51 -M m -U none --lvs


Robert Hurst, Sr. Systems Engineer
Beth Israel Deaconess Medical Center
1135 Tremont Street, REN-7
Boston, Massachusetts   02120-2140
617-754-8754 ∙ Fax: 617-754-8730 ∙ Cell: 401-787-3154
Any technology distinguishable from magic is insufficiently advanced.




More information about the Piranha-list mailing list