LVS start to malfunction after few hours
Tapan Thapa
tapan.thapa2000 at gmail.com
Tue Jan 19 08:23:02 UTC 2010
Hello,
I am not an expert but i have implemented piranha in my setup and it is
working fine so i am trying to help.
>From your configuration it seems that you have not enabled your second
server.
server bweb3.my-domain.com {
address = 82.81.215.140
*active = 0*
weight = 1
Please check this.
Regards
Tapan Thapa
On Tue, Jan 19, 2010 at 1:40 PM, Michael Ben-Nes <michael at epoch.co.il>wrote:
> Hi,
>
> My fresh LVS installation start to malfunction after few hours.
>
> I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only
> static files, not persistent ).
>
> When I start the service everything work as expected. After few hours one
> of the real servers become unavailable ( randomly ) even though ipvsadm show
> it is ok.
> The real server which is not answering is continuously been accessed by the
> nanny process every 6 sec and is accessible through its real IP.
>
> The only clue I found is that instead of 2 nanny process there are 4 nanny
> process ( 2 for each server ).
> logs show nothing of interest while pulse ruining.
>
> When I stop pulse all the related process are terminated beside the 4
> nannies which I need to kill by hand:
>
> Here is a snip from the logs:
> Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15
> Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15
> Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx
> Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15
> Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15
> Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15
> Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!
> Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!
> Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!
> Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15
> Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!
>
>
> Here is the data I gathered while the build malfunction:
>
> ####### LVS server ( no backup server )
>
> piranha-0.8.4-13.el5 - ipvsadm-1.24-10
>
> # sysctl net.ipv4.ip_forward
> net.ipv4.ip_forward = 1
>
> # cat /etc/sysconfig/ha/lvs.cf
> serial_no = 37
> primary = 82.81.215.137
> service = lvs
> backup = 0.0.0.0
> heartbeat = 1
> heartbeat_port = 539
> keepalive = 6
> deadtime = 18
> network = direct
> nat_nmask = 255.255.255.255
> debug_level = NONE
> virtual Nginx {
> active = 1
> address = 82.81.215.141 eth0:1
> vip_nmask = 255.255.255.224
> port = 80
> send = "GET / HTTP/1.0\r\n\r\n"
> expect = "HTTP"
> use_regex = 0
> load_monitor = none
> scheduler = wlc # Suppose to be RR - changed only to test if the
> scheduler is the problem - same effect
> protocol = tcp
> timeout = 6
> reentry = 15
> quiesce_server = 1
> server bweb1.my-domain.com {
> address = 82.81.215.138
> active = 1
> weight = 1
> }
> server bweb2.my-domain.com {
> address = 82.81.215.139
> active = 1
> weight = 1
> }
> server bweb3.my-domain.com {
> address = 82.81.215.140
> active = 0
> weight = 1
> }
> }
>
> # ipvsadm -L -n
> IP Virtual Server version 1.2.1 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> TCP 82.81.215.141:80 wlc
> -> 82.81.215.139:80 Route 1 0 0
> -> 82.81.215.138:80 Route 1 0 0
>
>
> # ps auxw|egrep "nanny|ipv|lvs|pulse"
> root 2668 0.0 0.0 8456 692 ? Ss Jan16 0:00
> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
> root 2670 0.0 0.0 8456 688 ? Ss Jan16 0:00
> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
> root 5795 0.0 0.0 8488 372 ? Ss Jan17 0:00 pulse
> root 5798 0.0 0.0 8476 656 ? Ss Jan17 0:00
> /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf
> root 5812 0.0 0.0 8456 692 ? Ss Jan17 0:00
> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
> root 5813 0.0 0.0 8456 692 ? Ss Jan17 0:00
> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>
>
>
> ####### One of the servers ( the one that does not answer. though its
> identical to the other )
>
> # arptables -L -n
> Chain IN (policy ACCEPT)
> target source-ip destination-ip source-hw
> destination-hw hlen op hrd pro
> DROP 0.0.0.0/0 82.81.215.141 00/00
> 00/00 any 0000/0000 0000/0000 0000/0000
>
> Chain OUT (policy ACCEPT)
> target source-ip destination-ip source-hw
> destination-hw hlen op hrd pro
> mangle 0.0.0.0/0 82.81.215.141 00/00
> 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s
> 82.81.215.139
>
> Chain FORWARD (policy ACCEPT)
> target source-ip destination-ip source-hw
> destination-hw hlen op hrd pro
>
>
> # ifconfig
> eth0 Link encap:Ethernet HWaddr 00:11:25:41:69:A4
> inet addr:82.81.215.139 Bcast:82.81.215.159
> Mask:255.255.255.224
> inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:602454 errors:0 dropped:0 overruns:0 frame:0
> TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:51144864 (48.7 MiB) TX bytes:251901147 (240.2 MiB)
> Interrupt:169 Memory:dcff0000-dd000000
>
> eth0:1 Link encap:Ethernet HWaddr 00:11:25:41:69:A4
> inet addr:82.81.215.141 Bcast:82.81.215.159
> Mask:255.255.255.224
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> Interrupt:169 Memory:dcff0000-dd000000
>
>
> Thanks for any idea that might shade some light on this topic :)
>
> Best,
> Miki
>
> --------------------------------------------------
> Michael Ben-Nes - Internet Consultant and Director.
> http://www.epoch.co.il - weaving the Net.
> Cellular: 054-4848113
> --------------------------------------------------
>
> _______________________________________________
> Piranha-list mailing list
> Piranha-list at redhat.com
> https://www.redhat.com/mailman/listinfo/piranha-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20100119/12e42930/attachment.htm>
More information about the Piranha-list
mailing list