From rcamphor at gmail.com Tue Jan 5 05:13:43 2010 From: rcamphor at gmail.com (Anil Pillai) Date: Tue, 5 Jan 2010 10:43:43 +0530 Subject: LVS + Piranha + Direct Routing + iptables Problem In-Reply-To: <88b0b1691001042103x52f5e489tce7f5f15d005f369@mail.gmail.com> References: <88b0b1690912220527w5d009f99u631396a0bd972108@mail.gmail.com> <88b0b1691001042103x52f5e489tce7f5f15d005f369@mail.gmail.com> Message-ID: <88b0b1691001042113q362f8f47hfa76b0236eefb889@mail.gmail.com> Hi, I was able to implement LVS with Direct Routing (iptables). I am facing a problem while using the iptables which requires iptables enrty on Real servers like this (iptables ?t nat ?A PREROUTING ?p tcp ?d --dport -j REDIRECT). Below is a brief on the setup: I have 3 servers Apache installed on all three servers (Port 80). Server 1 (10.50.57.22) -> 10.50.57.55 (VIP) -> running ?Pulse? Server 2 (10.50.57.40) Server 3 (10.50.57.48) I have configured LVS on port 80. Added the below iptables entry on 10.50.57.40 & 10.50.57.48 iptables ?t nat ?A PREROUTING ?p tcp ?d 10.50.57.55 --dport 80 -j REDIRECT With the above setup everything works fine. Even Apache on Server 1 (Which has the VIP) get the request as part of Load sharing. But if I add the iptables entry in Server 1 (10.50.57.22), Requests are received only on the Apache installed this host. The reason for doing this is to implement redundancy. Like in my case I have implemented the redundant setup on Server 2 (10.50.57.40) and once the ?pulse? is stopped on Server 1(10.50.57.22), ?pulse? is automatically started on Server 2 (10.50.57.40) which acquired the VIP (10.50.57.55). But since iptables is already active with the above entry all the requests are going to the Apache of same host (10.50.57.40). Does anyone faced similar issue ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at epoch.co.il Tue Jan 19 08:10:56 2010 From: michael at epoch.co.il (Michael Ben-Nes) Date: Tue, 19 Jan 2010 10:10:56 +0200 Subject: LVS start to malfunction after few hours Message-ID: Hi, My fresh LVS installation start to malfunction after few hours. I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only static files, not persistent ). When I start the service everything work as expected. After few hours one of the real servers become unavailable ( randomly ) even though ipvsadm show it is ok. The real server which is not answering is continuously been accessed by the nanny process every 6 sec and is accessible through its real IP. The only clue I found is that instead of 2 nanny process there are 4 nanny process ( 2 for each server ). logs show nothing of interest while pulse ruining. When I stop pulse all the related process are terminated beside the 4 nannies which I need to kill by hand: Here is a snip from the logs: Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15 Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15 Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15 Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15 Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15 Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed! Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed! Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed! Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15 Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed! Here is the data I gathered while the build malfunction: ####### LVS server ( no backup server ) piranha-0.8.4-13.el5 - ipvsadm-1.24-10 # sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 1 # cat /etc/sysconfig/ha/lvs.cf serial_no = 37 primary = 82.81.215.137 service = lvs backup = 0.0.0.0 heartbeat = 1 heartbeat_port = 539 keepalive = 6 deadtime = 18 network = direct nat_nmask = 255.255.255.255 debug_level = NONE virtual Nginx { active = 1 address = 82.81.215.141 eth0:1 vip_nmask = 255.255.255.224 port = 80 send = "GET / HTTP/1.0\r\n\r\n" expect = "HTTP" use_regex = 0 load_monitor = none scheduler = wlc # Suppose to be RR - changed only to test if the scheduler is the problem - same effect protocol = tcp timeout = 6 reentry = 15 quiesce_server = 1 server bweb1.my-domain.com { address = 82.81.215.138 active = 1 weight = 1 } server bweb2.my-domain.com { address = 82.81.215.139 active = 1 weight = 1 } server bweb3.my-domain.com { address = 82.81.215.140 active = 0 weight = 1 } } # ipvsadm -L -n IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 82.81.215.141:80 wlc -> 82.81.215.139:80 Route 1 0 0 -> 82.81.215.138:80 Route 1 0 0 # ps auxw|egrep "nanny|ipv|lvs|pulse" root 2668 0.0 0.0 8456 692 ? Ss Jan16 0:00 /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs root 2670 0.0 0.0 8456 688 ? Ss Jan16 0:00 /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs root 5795 0.0 0.0 8488 372 ? Ss Jan17 0:00 pulse root 5798 0.0 0.0 8476 656 ? Ss Jan17 0:00 /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf root 5812 0.0 0.0 8456 692 ? Ss Jan17 0:00 /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs root 5813 0.0 0.0 8456 692 ? Ss Jan17 0:00 /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs ####### One of the servers ( the one that does not answer. though its identical to the other ) # arptables -L -n Chain IN (policy ACCEPT) target source-ip destination-ip source-hw destination-hw hlen op hrd pro DROP 0.0.0.0/0 82.81.215.141 00/00 00/00 any 0000/0000 0000/0000 0000/0000 Chain OUT (policy ACCEPT) target source-ip destination-ip source-hw destination-hw hlen op hrd pro mangle 0.0.0.0/0 82.81.215.141 00/00 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s 82.81.215.139 Chain FORWARD (policy ACCEPT) target source-ip destination-ip source-hw destination-hw hlen op hrd pro # ifconfig eth0 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 inet addr:82.81.215.139 Bcast:82.81.215.159 Mask:255.255.255.224 inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:602454 errors:0 dropped:0 overruns:0 frame:0 TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:51144864 (48.7 MiB) TX bytes:251901147 (240.2 MiB) Interrupt:169 Memory:dcff0000-dd000000 eth0:1 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 inet addr:82.81.215.141 Bcast:82.81.215.159 Mask:255.255.255.224 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:169 Memory:dcff0000-dd000000 Thanks for any idea that might shade some light on this topic :) Best, Miki -------------------------------------------------- Michael Ben-Nes - Internet Consultant and Director. http://www.epoch.co.il - weaving the Net. Cellular: 054-4848113 -------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From tapan.thapa2000 at gmail.com Tue Jan 19 08:23:02 2010 From: tapan.thapa2000 at gmail.com (Tapan Thapa) Date: Tue, 19 Jan 2010 13:53:02 +0530 Subject: LVS start to malfunction after few hours In-Reply-To: References: Message-ID: <1dba33ef1001190023o65fe2f79s4b8c8dfa3dac8533@mail.gmail.com> Hello, I am not an expert but i have implemented piranha in my setup and it is working fine so i am trying to help. >From your configuration it seems that you have not enabled your second server. server bweb3.my-domain.com { address = 82.81.215.140 *active = 0* weight = 1 Please check this. Regards Tapan Thapa On Tue, Jan 19, 2010 at 1:40 PM, Michael Ben-Nes wrote: > Hi, > > My fresh LVS installation start to malfunction after few hours. > > I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only > static files, not persistent ). > > When I start the service everything work as expected. After few hours one > of the real servers become unavailable ( randomly ) even though ipvsadm show > it is ok. > The real server which is not answering is continuously been accessed by the > nanny process every 6 sec and is accessible through its real IP. > > The only clue I found is that instead of 2 nanny process there are 4 nanny > process ( 2 for each server ). > logs show nothing of interest while pulse ruining. > > When I stop pulse all the related process are terminated beside the 4 > nannies which I need to kill by hand: > > Here is a snip from the logs: > Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15 > Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15 > Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx > Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15 > Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15 > Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15 > Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed! > Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed! > Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed! > Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15 > Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed! > > > Here is the data I gathered while the build malfunction: > > ####### LVS server ( no backup server ) > > piranha-0.8.4-13.el5 - ipvsadm-1.24-10 > > # sysctl net.ipv4.ip_forward > net.ipv4.ip_forward = 1 > > # cat /etc/sysconfig/ha/lvs.cf > serial_no = 37 > primary = 82.81.215.137 > service = lvs > backup = 0.0.0.0 > heartbeat = 1 > heartbeat_port = 539 > keepalive = 6 > deadtime = 18 > network = direct > nat_nmask = 255.255.255.255 > debug_level = NONE > virtual Nginx { > active = 1 > address = 82.81.215.141 eth0:1 > vip_nmask = 255.255.255.224 > port = 80 > send = "GET / HTTP/1.0\r\n\r\n" > expect = "HTTP" > use_regex = 0 > load_monitor = none > scheduler = wlc # Suppose to be RR - changed only to test if the > scheduler is the problem - same effect > protocol = tcp > timeout = 6 > reentry = 15 > quiesce_server = 1 > server bweb1.my-domain.com { > address = 82.81.215.138 > active = 1 > weight = 1 > } > server bweb2.my-domain.com { > address = 82.81.215.139 > active = 1 > weight = 1 > } > server bweb3.my-domain.com { > address = 82.81.215.140 > active = 0 > weight = 1 > } > } > > # ipvsadm -L -n > IP Virtual Server version 1.2.1 (size=4096) > Prot LocalAddress:Port Scheduler Flags > -> RemoteAddress:Port Forward Weight ActiveConn InActConn > TCP 82.81.215.141:80 wlc > -> 82.81.215.139:80 Route 1 0 0 > -> 82.81.215.138:80 Route 1 0 0 > > > # ps auxw|egrep "nanny|ipv|lvs|pulse" > root 2668 0.0 0.0 8456 692 ? Ss Jan16 0:00 > /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x > HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs > root 2670 0.0 0.0 8456 688 ? Ss Jan16 0:00 > /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x > HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs > root 5795 0.0 0.0 8488 372 ? Ss Jan17 0:00 pulse > root 5798 0.0 0.0 8476 656 ? Ss Jan17 0:00 > /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf > root 5812 0.0 0.0 8456 692 ? Ss Jan17 0:00 > /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x > HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs > root 5813 0.0 0.0 8456 692 ? Ss Jan17 0:00 > /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x > HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs > > > > ####### One of the servers ( the one that does not answer. though its > identical to the other ) > > # arptables -L -n > Chain IN (policy ACCEPT) > target source-ip destination-ip source-hw > destination-hw hlen op hrd pro > DROP 0.0.0.0/0 82.81.215.141 00/00 > 00/00 any 0000/0000 0000/0000 0000/0000 > > Chain OUT (policy ACCEPT) > target source-ip destination-ip source-hw > destination-hw hlen op hrd pro > mangle 0.0.0.0/0 82.81.215.141 00/00 > 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s > 82.81.215.139 > > Chain FORWARD (policy ACCEPT) > target source-ip destination-ip source-hw > destination-hw hlen op hrd pro > > > # ifconfig > eth0 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 > inet addr:82.81.215.139 Bcast:82.81.215.159 > Mask:255.255.255.224 > inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:602454 errors:0 dropped:0 overruns:0 frame:0 > TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:51144864 (48.7 MiB) TX bytes:251901147 (240.2 MiB) > Interrupt:169 Memory:dcff0000-dd000000 > > eth0:1 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 > inet addr:82.81.215.141 Bcast:82.81.215.159 > Mask:255.255.255.224 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > Interrupt:169 Memory:dcff0000-dd000000 > > > Thanks for any idea that might shade some light on this topic :) > > Best, > Miki > > -------------------------------------------------- > Michael Ben-Nes - Internet Consultant and Director. > http://www.epoch.co.il - weaving the Net. > Cellular: 054-4848113 > -------------------------------------------------- > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at epoch.co.il Tue Jan 19 09:02:13 2010 From: michael at epoch.co.il (Michael Ben-Nes) Date: Tue, 19 Jan 2010 11:02:13 +0200 Subject: LVS start to malfunction after few hours In-Reply-To: <1dba33ef1001190023o65fe2f79s4b8c8dfa3dac8533@mail.gmail.com> References: <1dba33ef1001190023o65fe2f79s4b8c8dfa3dac8533@mail.gmail.com> Message-ID: Hi Tapan, Actually there are 3 web server in my configuration : bweb1 & bweb2 which are enabled bweb3 which is disabled So two are active. Its also seen in the output of ipvsadm # ipvsadm -L -n IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 82.81.215.141:80 wlc -> 82.81.215.139:80 Route 1 0 0 -> 82.81.215.138:80 Route 1 0 0 Thanks, Miki -------------------------------------------------- Michael Ben-Nes - Internet Consultant and Director. http://www.epoch.co.il - weaving the Net. Cellular: 054-4848113 -------------------------------------------------- On Tue, Jan 19, 2010 at 10:23 AM, Tapan Thapa wrote: > Hello, > > I am not an expert but i have implemented piranha in my setup and it is > working fine so i am trying to help. > > From your configuration it seems that you have not enabled your second > server. > > > server bweb3.my-domain.com { > address = 82.81.215.140 > *active = 0* > weight = 1 > > Please check this. > > Regards > Tapan Thapa > > On Tue, Jan 19, 2010 at 1:40 PM, Michael Ben-Nes wrote: > >> Hi, >> >> My fresh LVS installation start to malfunction after few hours. >> >> I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only >> static files, not persistent ). >> >> When I start the service everything work as expected. After few hours one >> of the real servers become unavailable ( randomly ) even though ipvsadm show >> it is ok. >> The real server which is not answering is continuously been accessed by >> the nanny process every 6 sec and is accessible through its real IP. >> >> The only clue I found is that instead of 2 nanny process there are 4 nanny >> process ( 2 for each server ). >> logs show nothing of interest while pulse ruining. >> >> When I stop pulse all the related process are terminated beside the 4 >> nannies which I need to kill by hand: >> >> Here is a snip from the logs: >> Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15 >> Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15 >> Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx >> Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15 >> Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15 >> Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15 >> Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed! >> Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed! >> Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed! >> Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15 >> Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed! >> >> >> Here is the data I gathered while the build malfunction: >> >> ####### LVS server ( no backup server ) >> >> piranha-0.8.4-13.el5 - ipvsadm-1.24-10 >> >> # sysctl net.ipv4.ip_forward >> net.ipv4.ip_forward = 1 >> >> # cat /etc/sysconfig/ha/lvs.cf >> serial_no = 37 >> primary = 82.81.215.137 >> service = lvs >> backup = 0.0.0.0 >> heartbeat = 1 >> heartbeat_port = 539 >> keepalive = 6 >> deadtime = 18 >> network = direct >> nat_nmask = 255.255.255.255 >> debug_level = NONE >> virtual Nginx { >> active = 1 >> address = 82.81.215.141 eth0:1 >> vip_nmask = 255.255.255.224 >> port = 80 >> send = "GET / HTTP/1.0\r\n\r\n" >> expect = "HTTP" >> use_regex = 0 >> load_monitor = none >> scheduler = wlc # Suppose to be RR - changed only to test if the >> scheduler is the problem - same effect >> protocol = tcp >> timeout = 6 >> reentry = 15 >> quiesce_server = 1 >> server bweb1.my-domain.com { >> address = 82.81.215.138 >> active = 1 >> weight = 1 >> } >> server bweb2.my-domain.com { >> address = 82.81.215.139 >> active = 1 >> weight = 1 >> } >> server bweb3.my-domain.com { >> address = 82.81.215.140 >> active = 0 >> weight = 1 >> } >> } >> >> # ipvsadm -L -n >> IP Virtual Server version 1.2.1 (size=4096) >> Prot LocalAddress:Port Scheduler Flags >> -> RemoteAddress:Port Forward Weight ActiveConn InActConn >> TCP 82.81.215.141:80 wlc >> -> 82.81.215.139:80 Route 1 0 0 >> -> 82.81.215.138:80 Route 1 0 0 >> >> >> # ps auxw|egrep "nanny|ipv|lvs|pulse" >> root 2668 0.0 0.0 8456 692 ? Ss Jan16 0:00 >> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x >> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs >> root 2670 0.0 0.0 8456 688 ? Ss Jan16 0:00 >> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x >> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs >> root 5795 0.0 0.0 8488 372 ? Ss Jan17 0:00 pulse >> root 5798 0.0 0.0 8476 656 ? Ss Jan17 0:00 >> /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf >> root 5812 0.0 0.0 8456 692 ? Ss Jan17 0:00 >> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x >> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs >> root 5813 0.0 0.0 8456 692 ? Ss Jan17 0:00 >> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x >> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs >> >> >> >> ####### One of the servers ( the one that does not answer. though its >> identical to the other ) >> >> # arptables -L -n >> Chain IN (policy ACCEPT) >> target source-ip destination-ip source-hw >> destination-hw hlen op hrd pro >> DROP 0.0.0.0/0 82.81.215.141 00/00 >> 00/00 any 0000/0000 0000/0000 0000/0000 >> >> Chain OUT (policy ACCEPT) >> target source-ip destination-ip source-hw >> destination-hw hlen op hrd pro >> mangle 0.0.0.0/0 82.81.215.141 00/00 >> 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s >> 82.81.215.139 >> >> Chain FORWARD (policy ACCEPT) >> target source-ip destination-ip source-hw >> destination-hw hlen op hrd pro >> >> >> # ifconfig >> eth0 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 >> inet addr:82.81.215.139 Bcast:82.81.215.159 >> Mask:255.255.255.224 >> inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:602454 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:51144864 (48.7 MiB) TX bytes:251901147 (240.2 MiB) >> Interrupt:169 Memory:dcff0000-dd000000 >> >> eth0:1 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 >> inet addr:82.81.215.141 Bcast:82.81.215.159 >> Mask:255.255.255.224 >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> Interrupt:169 Memory:dcff0000-dd000000 >> >> >> Thanks for any idea that might shade some light on this topic :) >> >> Best, >> Miki >> >> -------------------------------------------------- >> Michael Ben-Nes - Internet Consultant and Director. >> http://www.epoch.co.il - weaving the Net. >> Cellular: 054-4848113 >> -------------------------------------------------- >> >> _______________________________________________ >> Piranha-list mailing list >> Piranha-list at redhat.com >> https://www.redhat.com/mailman/listinfo/piranha-list >> > > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list > -------------- next part -------------- An HTML attachment was scrubbed... URL: From laszlo at beres.me Tue Jan 19 09:03:28 2010 From: laszlo at beres.me (Laszlo Beres) Date: Tue, 19 Jan 2010 10:03:28 +0100 Subject: LVS start to malfunction after few hours In-Reply-To: References: Message-ID: <85e375aa1001190103s50b970date9230b4116e68986@mail.gmail.com> On Tue, Jan 19, 2010 at 9:10 AM, Michael Ben-Nes wrote: > Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed! > Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed! > Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed! > Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15 > Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed! That reminds me to a bug: http://bugs.centos.org/view.php?id=3924 Except that in the sample above pulse couldn't even start up properly. -- L?szl? B?res Unix system engineer http://www.google.com/profiles/beres.laszlo From michael at epoch.co.il Tue Jan 19 09:31:47 2010 From: michael at epoch.co.il (Michael Ben-Nes) Date: Tue, 19 Jan 2010 11:31:47 +0200 Subject: LVS start to malfunction after few hours In-Reply-To: <85e375aa1001190103s50b970date9230b4116e68986@mail.gmail.com> References: <85e375aa1001190103s50b970date9230b4116e68986@mail.gmail.com> Message-ID: Thanks Laszlo for the refer. As mentioned in the bug I downgraded to piranha-0.8.4-11 ( got it from OS ver 5.3 ) I will return with a feedback in a day, with the hope it will be stable :) Miki -------------------------------------------------- Michael Ben-Nes - Internet Consultant and Director. http://www.epoch.co.il - weaving the Net. Cellular: 054-4848113 -------------------------------------------------- On Tue, Jan 19, 2010 at 11:03 AM, Laszlo Beres wrote: > On Tue, Jan 19, 2010 at 9:10 AM, Michael Ben-Nes > wrote: > > > Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed! > > Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed! > > Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed! > > Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15 > > Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed! > > That reminds me to a bug: > > http://bugs.centos.org/view.php?id=3924 > > Except that in the sample above pulse couldn't even start up properly. > > -- > L?szl? B?res Unix system engineer > http://www.google.com/profiles/beres.laszlo > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbello at gmail.com Tue Jan 19 12:51:48 2010 From: dbello at gmail.com (Diego Bello) Date: Tue, 19 Jan 2010 09:51:48 -0300 Subject: Disabling a Real server Message-ID: Hi people, I have a LVS cluster using CentOS 5.3 and I need, from time to time, take one of the real servers out for maintenance and then take it in again. How can I "disable" a real server from the cluster so it is no longer monitored until It is back again?. I thought on shutting down the service, Postgresql in this case, but the monitoring scripts I have send an email alert whenever a Real Server is not available, and I don't want to be receiving tons of emails whenever a real server is being updated. The only way I can think about is to modify the router configuration file (lvs.cf) and reload pulse, but I'm not totally sure that's the most elegant choice. I need it to be done automatically using scripts, but I can't find any hint in man pages of lvsd, nanny or ipvsadm. Thanks. -- Diego Bello Carre?o From rhurst at bidmc.harvard.edu Tue Jan 19 17:41:07 2010 From: rhurst at bidmc.harvard.edu (rhurst at bidmc.harvard.edu) Date: Tue, 19 Jan 2010 12:41:07 -0500 Subject: Disabling a Real server In-Reply-To: References: Message-ID: <50168EC934B8D64AA8D8DD37F840F3DE055B9019CC@EVS2CCR.its.caregroup.org> Hopefully, I am understanding your use-case and question ... If the service goes "offline", the cooresponding pulse -> lvsd -> nanny process is supposed to detect event, and then remove the real server from the vip:port pool. See syslog for that event messages. If you want to do that manually, i.e., a planned quiesce of the service, just set the weight factor of the real server to zero. Monitor its current active clients until it reaches zero, because no new connections will be allowed into that server while it is set to zero. Then, shutdown the service. To drain clients, issue a command like: ipvsadm -e -t eben:8000 -r web02:8000 -g -w 0 ... where eben is the virtual host name and web02 is the real server name. Do the reverse when you are ready to bring it back into the pool: server up, service up, and set its weight factor >0. -----Original Message----- From: piranha-list-bounces at redhat.com [mailto:piranha-list-bounces at redhat.com] On Behalf Of Diego Bello Sent: Tuesday, January 19, 2010 7:52 AM To: piranha-list at redhat.com Subject: Disabling a Real server Hi people, I have a LVS cluster using CentOS 5.3 and I need, from time to time, take one of the real servers out for maintenance and then take it in again. How can I "disable" a real server from the cluster so it is no longer monitored until It is back again?. I thought on shutting down the service, Postgresql in this case, but the monitoring scripts I have send an email alert whenever a Real Server is not available, and I don't want to be receiving tons of emails whenever a real server is being updated. The only way I can think about is to modify the router configuration file (lvs.cf) and reload pulse, but I'm not totally sure that's the most elegant choice. I need it to be done automatically using scripts, but I can't find any hint in man pages of lvsd, nanny or ipvsadm. Thanks. -- Diego Bello Carre?o _______________________________________________ Piranha-list mailing list Piranha-list at redhat.com https://www.redhat.com/mailman/listinfo/piranha-list From dbello at gmail.com Tue Jan 19 17:56:20 2010 From: dbello at gmail.com (Diego Bello) Date: Tue, 19 Jan 2010 14:56:20 -0300 Subject: Disabling a Real server In-Reply-To: <50168EC934B8D64AA8D8DD37F840F3DE055B9019CC@EVS2CCR.its.caregroup.org> References: <50168EC934B8D64AA8D8DD37F840F3DE055B9019CC@EVS2CCR.its.caregroup.org> Message-ID: On Tue, Jan 19, 2010 at 2:41 PM, wrote: > Hopefully, I am understanding your use-case and question ... > > If the service goes "offline", the cooresponding pulse -> lvsd -> nanny process is supposed to detect event, and then remove the real server from the vip:port pool. ?See syslog for that event messages. > > If you want to do that manually, i.e., a planned quiesce of the service, just set the weight factor of the real server to zero. ?Monitor its current active clients until it reaches zero, because no new connections will be allowed into that server while it is set to zero. ?Then, shutdown the service. > > To drain clients, issue a command like: > > ipvsadm -e -t eben:8000 -r web02:8000 -g -w 0 > > ... where eben is the virtual host name and web02 is the real server name. > > Do the reverse when you are ready to bring it back into the pool: server up, service up, and set its weight factor >0. > This is what I do, but when one server has weight = 0, the monitoring script keeps working on it. It is still part of the pool. This monitoring script is used, in this case, to alert when a machine is down, so even when I set its weight to 0 and shutdown postgres, alerts keep telling me that the machine is down. > -----Original Message----- > From: piranha-list-bounces at redhat.com [mailto:piranha-list-bounces at redhat.com] On Behalf Of Diego Bello > Sent: Tuesday, January 19, 2010 7:52 AM > To: piranha-list at redhat.com > Subject: Disabling a Real server > > Hi people, > > I have a LVS cluster using CentOS 5.3 and I need, from time to time, take one of the real servers out for maintenance and then take it in again. How can I "disable" a real server from the cluster so it is no longer monitored until It is back again?. > > I thought on shutting down the service, Postgresql in this case, but the monitoring scripts I have send an email alert whenever a Real Server is not available, and I don't want to be receiving tons of emails whenever a real server is being updated. > > The only way I can think about is to modify the router configuration file (lvs.cf) and reload pulse, but I'm not totally sure that's the most elegant choice. I need it to be done automatically using scripts, but I can't find any hint in man pages of lvsd, nanny or ipvsadm. > > Thanks. > > -- > Diego Bello Carre?o > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list > > > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list > -- Diego Bello Carre?o From michael at epoch.co.il Sun Jan 24 10:40:02 2010 From: michael at epoch.co.il (Michael Ben-Nes) Date: Sun, 24 Jan 2010 12:40:02 +0200 Subject: LVS start to malfunction after few hours In-Reply-To: References: <85e375aa1001190103s50b970date9230b4116e68986@mail.gmail.com> Message-ID: Ok, this is the current status. The solution that uses arptables + the VIP on eth0:1 is not working for me with LVS-DR. After few hours of service, the LVS breaks and I receive answers only from one server. I checked using /sbin/arping if ARP packets pass the arptable fw but couldn't find any. After some debates I choose a different setup then the one mentioned by RH docs. Removed / stopped the arptable package. Added to sysctl.conf: net.ipv4.conf.all.arp_ignore = 1 net.ipv4.conf.eth0.arp_ignore = 1 net.ipv4.conf.all.arp_announce = 2 net.ipv4.conf.eth0.arp_announce = 2 Moved the VIP from eth:0 to lo:0 Now its working as expected. note - the bug mentioned at RH bugzilla is not relevant as it address different problem. -------------------------------------------------- Michael Ben-Nes - Internet Consultant and Director. http://www.epoch.co.il - weaving the Net. Cellular: 054-4848113 -------------------------------------------------- On Tue, Jan 19, 2010 at 11:31 AM, Michael Ben-Nes wrote: > Thanks Laszlo for the refer. > > As mentioned in the bug I downgraded to piranha-0.8.4-11 ( got it from OS > ver 5.3 ) > I will return with a feedback in a day, with the hope it will be stable :) > > Miki > > -------------------------------------------------- > Michael Ben-Nes - Internet Consultant and Director. > http://www.epoch.co.il - weaving the Net. > Cellular: 054-4848113 > -------------------------------------------------- > > > On Tue, Jan 19, 2010 at 11:03 AM, Laszlo Beres wrote: > >> On Tue, Jan 19, 2010 at 9:10 AM, Michael Ben-Nes >> wrote: >> >> > Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed! >> > Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed! >> > Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed! >> > Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15 >> > Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed! >> >> That reminds me to a bug: >> >> http://bugs.centos.org/view.php?id=3924 >> >> Except that in the sample above pulse couldn't even start up properly. >> >> -- >> L?szl? B?res Unix system engineer >> http://www.google.com/profiles/beres.laszlo >> >> _______________________________________________ >> Piranha-list mailing list >> Piranha-list at redhat.com >> https://www.redhat.com/mailman/listinfo/piranha-list >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: