LVS start to malfunction after few hours

Tue Jan 19 09:02:13 UTC 2010

Hi Tapan,

Actually there are 3 web server in my configuration :

bweb1 & bweb2 which are enabled
bweb3 which is disabled

So two are active. Its also seen in the output of ipvsadm
# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  82.81.215.141:80 wlc
  -> 82.81.215.139:80             Route   1      0          0
  -> 82.81.215.138:80             Route   1      0          0

Thanks,
Miki

--------------------------------------------------
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--------------------------------------------------

On Tue, Jan 19, 2010 at 10:23 AM, Tapan Thapa <tapan.thapa2000 at gmail.com>wrote:

> Hello,
>
> I am not an expert but i have implemented piranha in my setup and it is
> working fine so i am trying to help.
>
> From your configuration it seems that you have not enabled your second
> server.
>
>
> server bweb3.my-domain.com {
>          address = 82.81.215.140
>          *active = 0*
>          weight = 1
>
> Please check this.
>
> Regards
> Tapan Thapa
>
> On Tue, Jan 19, 2010 at 1:40 PM, Michael Ben-Nes <michael at epoch.co.il>wrote:
>
>> Hi,
>>
>> My fresh LVS installation start to malfunction after few hours.
>>
>> I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only
>> static files, not persistent ).
>>
>> When I start the service everything work as expected. After few hours one
>> of the real servers become unavailable ( randomly ) even though ipvsadm show
>> it is ok.
>> The real server which is not answering is continuously been accessed by
>> the nanny process every 6 sec and is accessible through its real IP.
>>
>> The only clue I found is that instead of 2 nanny process there are 4 nanny
>> process ( 2 for each server ).
>> logs show nothing of interest while pulse ruining.
>>
>> When I stop pulse all the related process are terminated beside the 4
>> nannies which I need to kill by hand:
>>
>> Here is a snip from the logs:
>> Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15
>> Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15
>> Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx
>> Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15
>> Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15
>> Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15
>> Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!
>> Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!
>> Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!
>> Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15
>> Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!
>>
>>
>> Here is the data I gathered while the build malfunction:
>>
>> ####### LVS server ( no backup server )
>>
>> piranha-0.8.4-13.el5 - ipvsadm-1.24-10
>>
>> # sysctl net.ipv4.ip_forward
>> net.ipv4.ip_forward = 1
>>
>> # cat /etc/sysconfig/ha/lvs.cf
>> serial_no = 37
>> primary = 82.81.215.137
>> service = lvs
>> backup = 0.0.0.0
>> heartbeat = 1
>> heartbeat_port = 539
>> keepalive = 6
>> deadtime = 18
>> network = direct
>> nat_nmask = 255.255.255.255
>> debug_level = NONE
>> virtual Nginx {
>>      active = 1
>>      address = 82.81.215.141 eth0:1
>>      vip_nmask = 255.255.255.224
>>      port = 80
>>      send = "GET / HTTP/1.0\r\n\r\n"
>>      expect = "HTTP"
>>      use_regex = 0
>>      load_monitor = none
>>      scheduler = wlc  # Suppose to be RR - changed only to test if the
>> scheduler is the problem - same effect
>>      protocol = tcp
>>      timeout = 6
>>      reentry = 15
>>      quiesce_server = 1
>>      server bweb1.my-domain.com {
>>          address = 82.81.215.138
>>          active = 1
>>          weight = 1
>>      }
>>      server bweb2.my-domain.com {
>>          address = 82.81.215.139
>>          active = 1
>>          weight = 1
>>      }
>>      server bweb3.my-domain.com {
>>          address = 82.81.215.140
>>          active = 0
>>          weight = 1
>>      }
>> }
>>
>> # ipvsadm -L -n
>> IP Virtual Server version 1.2.1 (size=4096)
>> Prot LocalAddress:Port Scheduler Flags
>>   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
>> TCP  82.81.215.141:80 wlc
>>   -> 82.81.215.139:80             Route   1      0          0
>>   -> 82.81.215.138:80             Route   1      0          0
>>
>>
>> # ps auxw|egrep "nanny|ipv|lvs|pulse"
>> root      2668  0.0  0.0   8456   692 ?        Ss   Jan16   0:00
>> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
>> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>> root      2670  0.0  0.0   8456   688 ?        Ss   Jan16   0:00
>> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
>> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>> root      5795  0.0  0.0   8488   372 ?        Ss   Jan17   0:00 pulse
>> root      5798  0.0  0.0   8476   656 ?        Ss   Jan17   0:00
>> /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf
>> root      5812  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
>> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
>> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>> root      5813  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
>> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
>> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>>
>>
>>
>> ####### One of the servers ( the one that does not answer. though its
>> identical to the other )
>>
>> # arptables -L -n
>> Chain IN (policy ACCEPT)
>> target     source-ip            destination-ip       source-hw
>>  destination-hw     hlen   op         hrd        pro
>> DROP       0.0.0.0/0            82.81.215.141        00/00
>>  00/00              any    0000/0000  0000/0000  0000/0000
>>
>> Chain OUT (policy ACCEPT)
>> target     source-ip            destination-ip       source-hw
>>  destination-hw     hlen   op         hrd        pro
>> mangle     0.0.0.0/0            82.81.215.141        00/00
>>  00/00              any    0000/0000  0000/0000  0000/0000 --mangle-ip-s
>> 82.81.215.139
>>
>> Chain FORWARD (policy ACCEPT)
>> target     source-ip            destination-ip       source-hw
>>  destination-hw     hlen   op         hrd        pro
>>
>>
>>  # ifconfig
>> eth0      Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
>>           inet addr:82.81.215.139  Bcast:82.81.215.159
>>  Mask:255.255.255.224
>>           inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           RX packets:602454 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:51144864 (48.7 MiB)  TX bytes:251901147 (240.2 MiB)
>>           Interrupt:169 Memory:dcff0000-dd000000
>>
>> eth0:1    Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
>>           inet addr:82.81.215.141  Bcast:82.81.215.159
>>  Mask:255.255.255.224
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           Interrupt:169 Memory:dcff0000-dd000000
>>
>>
>> Thanks for any idea that might shade some light on this topic :)
>>
>> Best,
>> Miki
>>
>> --------------------------------------------------
>> Michael Ben-Nes - Internet Consultant and Director.
>> http://www.epoch.co.il - weaving the Net.
>> Cellular: 054-4848113
>> --------------------------------------------------
>>
>> _______________________________________________
>> Piranha-list mailing list
>> Piranha-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/piranha-list
>>
>
>
> _______________________________________________
> Piranha-list mailing list
> Piranha-list at redhat.com
> https://www.redhat.com/mailman/listinfo/piranha-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20100119/b3125bfb/attachment.htm>