From rcamphor at gmail.com  Tue Jan  5 05:13:43 2010
From: rcamphor at gmail.com (Anil Pillai)
Date: Tue, 5 Jan 2010 10:43:43 +0530
Subject: LVS + Piranha + Direct Routing + iptables Problem
In-Reply-To: <88b0b1691001042103x52f5e489tce7f5f15d005f369@mail.gmail.com>
References: <88b0b1690912220527w5d009f99u631396a0bd972108@mail.gmail.com>
	<88b0b1691001042103x52f5e489tce7f5f15d005f369@mail.gmail.com>
Message-ID: <88b0b1691001042113q362f8f47hfa76b0236eefb889@mail.gmail.com>

  Hi,

I was able to implement LVS with Direct Routing (iptables).
I am facing a problem while using the iptables which requires

iptables enrty on Real servers like this (iptables ?t nat ?A PREROUTING ?p
tcp ?d <VIP> --dport <port> -j REDIRECT).



Below is a brief on the setup:



I have 3 servers Apache installed on all three servers (Port 80).



Server 1 (10.50.57.22)  -> 10.50.57.55 (VIP) -> running ?Pulse?

Server 2 (10.50.57.40)

Server 3 (10.50.57.48)



I have configured LVS on port 80.  Added the below iptables entry on
10.50.57.40 & 10.50.57.48

iptables ?t nat ?A PREROUTING ?p tcp ?d 10.50.57.55 --dport 80 -j REDIRECT



With the above setup everything works fine. Even Apache on Server 1 (Which
has the VIP) get the request as part of Load sharing.



But if I add the iptables entry in Server 1 (10.50.57.22), Requests are
received only on the Apache installed this host.



The reason for doing this is to implement redundancy. Like in my case I have
implemented the redundant setup on Server 2 (10.50.57.40) and once the
?pulse? is stopped on Server 1(10.50.57.22), ?pulse? is automatically
started on Server 2 (10.50.57.40) which acquired the VIP (10.50.57.55). But
since iptables is already active with the above entry all the requests are
going to the Apache of same host (10.50.57.40).



Does anyone faced similar issue ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20100105/b89fc82c/attachment.htm>

From michael at epoch.co.il  Tue Jan 19 08:10:56 2010
From: michael at epoch.co.il (Michael Ben-Nes)
Date: Tue, 19 Jan 2010 10:10:56 +0200
Subject: LVS start to malfunction after few hours
Message-ID: <f3be18fa1001190010t65496171v8ec6bc44c7bdbef5@mail.gmail.com>

Hi,

My fresh LVS installation start to malfunction after few hours.

I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only
static files, not persistent ).

When I start the service everything work as expected. After few hours one of
the real servers become unavailable ( randomly ) even though ipvsadm show it
is ok.
The real server which is not answering is continuously been accessed by the
nanny process every 6 sec and is accessible through its real IP.

The only clue I found is that instead of 2 nanny process there are 4 nanny
process ( 2 for each server ).
logs show nothing of interest while pulse ruining.

When I stop pulse all the related process are terminated beside the 4
nannies which I need to kill by hand:

Here is a snip from the logs:
Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15
Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15
Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx
Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15
Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15
Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15
Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!
Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!
Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!
Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15
Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!


Here is the data I gathered while the build malfunction:

####### LVS server ( no backup server )

piranha-0.8.4-13.el5 - ipvsadm-1.24-10

# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

# cat /etc/sysconfig/ha/lvs.cf
serial_no = 37
primary = 82.81.215.137
service = lvs
backup = 0.0.0.0
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = direct
nat_nmask = 255.255.255.255
debug_level = NONE
virtual Nginx {
     active = 1
     address = 82.81.215.141 eth0:1
     vip_nmask = 255.255.255.224
     port = 80
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     use_regex = 0
     load_monitor = none
     scheduler = wlc  # Suppose to be RR - changed only to test if the
scheduler is the problem - same effect
     protocol = tcp
     timeout = 6
     reentry = 15
     quiesce_server = 1
     server bweb1.my-domain.com {
         address = 82.81.215.138
         active = 1
         weight = 1
     }
     server bweb2.my-domain.com {
         address = 82.81.215.139
         active = 1
         weight = 1
     }
     server bweb3.my-domain.com {
         address = 82.81.215.140
         active = 0
         weight = 1
     }
}

# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  82.81.215.141:80 wlc
  -> 82.81.215.139:80             Route   1      0          0
  -> 82.81.215.138:80             Route   1      0          0


# ps auxw|egrep "nanny|ipv|lvs|pulse"
root      2668  0.0  0.0   8456   692 ?        Ss   Jan16   0:00
/usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
root      2670  0.0  0.0   8456   688 ?        Ss   Jan16   0:00
/usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
root      5795  0.0  0.0   8488   372 ?        Ss   Jan17   0:00 pulse
root      5798  0.0  0.0   8476   656 ?        Ss   Jan17   0:00
/usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf
root      5812  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
/usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
root      5813  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
/usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs



####### One of the servers ( the one that does not answer. though its
identical to the other )

# arptables -L -n
Chain IN (policy ACCEPT)
target     source-ip            destination-ip       source-hw
 destination-hw     hlen   op         hrd        pro
DROP       0.0.0.0/0            82.81.215.141        00/00
 00/00              any    0000/0000  0000/0000  0000/0000

Chain OUT (policy ACCEPT)
target     source-ip            destination-ip       source-hw
 destination-hw     hlen   op         hrd        pro
mangle     0.0.0.0/0            82.81.215.141        00/00
 00/00              any    0000/0000  0000/0000  0000/0000 --mangle-ip-s
82.81.215.139

Chain FORWARD (policy ACCEPT)
target     source-ip            destination-ip       source-hw
 destination-hw     hlen   op         hrd        pro


# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
          inet addr:82.81.215.139  Bcast:82.81.215.159  Mask:255.255.255.224
          inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:602454 errors:0 dropped:0 overruns:0 frame:0
          TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:51144864 (48.7 MiB)  TX bytes:251901147 (240.2 MiB)
          Interrupt:169 Memory:dcff0000-dd000000

eth0:1    Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
          inet addr:82.81.215.141  Bcast:82.81.215.159  Mask:255.255.255.224
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:169 Memory:dcff0000-dd000000


Thanks for any idea that might shade some light on this topic :)

Best,
Miki

--------------------------------------------------
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20100119/73c723dc/attachment.htm>

From tapan.thapa2000 at gmail.com  Tue Jan 19 08:23:02 2010
From: tapan.thapa2000 at gmail.com (Tapan Thapa)
Date: Tue, 19 Jan 2010 13:53:02 +0530
Subject: LVS start to malfunction after few hours
In-Reply-To: <f3be18fa1001190010t65496171v8ec6bc44c7bdbef5@mail.gmail.com>
References: <f3be18fa1001190010t65496171v8ec6bc44c7bdbef5@mail.gmail.com>
Message-ID: <1dba33ef1001190023o65fe2f79s4b8c8dfa3dac8533@mail.gmail.com>

Hello,

I am not an expert but i have implemented piranha in my setup and it is
working fine so i am trying to help.

>From your configuration it seems that you have not enabled your second
server.

server bweb3.my-domain.com {
         address = 82.81.215.140
         *active = 0*
         weight = 1

Please check this.

Regards
Tapan Thapa

On Tue, Jan 19, 2010 at 1:40 PM, Michael Ben-Nes <michael at epoch.co.il>wrote:

> Hi,
>
> My fresh LVS installation start to malfunction after few hours.
>
> I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only
> static files, not persistent ).
>
> When I start the service everything work as expected. After few hours one
> of the real servers become unavailable ( randomly ) even though ipvsadm show
> it is ok.
> The real server which is not answering is continuously been accessed by the
> nanny process every 6 sec and is accessible through its real IP.
>
> The only clue I found is that instead of 2 nanny process there are 4 nanny
> process ( 2 for each server ).
> logs show nothing of interest while pulse ruining.
>
> When I stop pulse all the related process are terminated beside the 4
> nannies which I need to kill by hand:
>
> Here is a snip from the logs:
> Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15
> Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15
> Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx
> Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15
> Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15
> Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15
> Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!
> Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!
> Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!
> Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15
> Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!
>
>
> Here is the data I gathered while the build malfunction:
>
> ####### LVS server ( no backup server )
>
> piranha-0.8.4-13.el5 - ipvsadm-1.24-10
>
> # sysctl net.ipv4.ip_forward
> net.ipv4.ip_forward = 1
>
> # cat /etc/sysconfig/ha/lvs.cf
> serial_no = 37
> primary = 82.81.215.137
> service = lvs
> backup = 0.0.0.0
> heartbeat = 1
> heartbeat_port = 539
> keepalive = 6
> deadtime = 18
> network = direct
> nat_nmask = 255.255.255.255
> debug_level = NONE
> virtual Nginx {
>      active = 1
>      address = 82.81.215.141 eth0:1
>      vip_nmask = 255.255.255.224
>      port = 80
>      send = "GET / HTTP/1.0\r\n\r\n"
>      expect = "HTTP"
>      use_regex = 0
>      load_monitor = none
>      scheduler = wlc  # Suppose to be RR - changed only to test if the
> scheduler is the problem - same effect
>      protocol = tcp
>      timeout = 6
>      reentry = 15
>      quiesce_server = 1
>      server bweb1.my-domain.com {
>          address = 82.81.215.138
>          active = 1
>          weight = 1
>      }
>      server bweb2.my-domain.com {
>          address = 82.81.215.139
>          active = 1
>          weight = 1
>      }
>      server bweb3.my-domain.com {
>          address = 82.81.215.140
>          active = 0
>          weight = 1
>      }
> }
>
> # ipvsadm -L -n
> IP Virtual Server version 1.2.1 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
>   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
> TCP  82.81.215.141:80 wlc
>   -> 82.81.215.139:80             Route   1      0          0
>   -> 82.81.215.138:80             Route   1      0          0
>
>
> # ps auxw|egrep "nanny|ipv|lvs|pulse"
> root      2668  0.0  0.0   8456   692 ?        Ss   Jan16   0:00
> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
> root      2670  0.0  0.0   8456   688 ?        Ss   Jan16   0:00
> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
> root      5795  0.0  0.0   8488   372 ?        Ss   Jan17   0:00 pulse
> root      5798  0.0  0.0   8476   656 ?        Ss   Jan17   0:00
> /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf
> root      5812  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
> root      5813  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>
>
>
> ####### One of the servers ( the one that does not answer. though its
> identical to the other )
>
> # arptables -L -n
> Chain IN (policy ACCEPT)
> target     source-ip            destination-ip       source-hw
>  destination-hw     hlen   op         hrd        pro
> DROP       0.0.0.0/0            82.81.215.141        00/00
>  00/00              any    0000/0000  0000/0000  0000/0000
>
> Chain OUT (policy ACCEPT)
> target     source-ip            destination-ip       source-hw
>  destination-hw     hlen   op         hrd        pro
> mangle     0.0.0.0/0            82.81.215.141        00/00
>  00/00              any    0000/0000  0000/0000  0000/0000 --mangle-ip-s
> 82.81.215.139
>
> Chain FORWARD (policy ACCEPT)
> target     source-ip            destination-ip       source-hw
>  destination-hw     hlen   op         hrd        pro
>
>
>  # ifconfig
> eth0      Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
>           inet addr:82.81.215.139  Bcast:82.81.215.159
>  Mask:255.255.255.224
>           inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:602454 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:51144864 (48.7 MiB)  TX bytes:251901147 (240.2 MiB)
>           Interrupt:169 Memory:dcff0000-dd000000
>
> eth0:1    Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
>           inet addr:82.81.215.141  Bcast:82.81.215.159
>  Mask:255.255.255.224
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           Interrupt:169 Memory:dcff0000-dd000000
>
>
> Thanks for any idea that might shade some light on this topic :)
>
> Best,
> Miki
>
> --------------------------------------------------
> Michael Ben-Nes - Internet Consultant and Director.
> http://www.epoch.co.il - weaving the Net.
> Cellular: 054-4848113
> --------------------------------------------------
>
> _______________________________________________
> Piranha-list mailing list
> Piranha-list at redhat.com
> https://www.redhat.com/mailman/listinfo/piranha-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20100119/12e42930/attachment.htm>

From michael at epoch.co.il  Tue Jan 19 09:02:13 2010
From: michael at epoch.co.il (Michael Ben-Nes)
Date: Tue, 19 Jan 2010 11:02:13 +0200
Subject: LVS start to malfunction after few hours
In-Reply-To: <1dba33ef1001190023o65fe2f79s4b8c8dfa3dac8533@mail.gmail.com>
References: <f3be18fa1001190010t65496171v8ec6bc44c7bdbef5@mail.gmail.com>
	<1dba33ef1001190023o65fe2f79s4b8c8dfa3dac8533@mail.gmail.com>
Message-ID: <f3be18fa1001190102y3017f9eey2c9d676330544263@mail.gmail.com>

Hi Tapan,

Actually there are 3 web server in my configuration :

bweb1 & bweb2 which are enabled
bweb3 which is disabled

So two are active. Its also seen in the output of ipvsadm
# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  82.81.215.141:80 wlc
  -> 82.81.215.139:80             Route   1      0          0
  -> 82.81.215.138:80             Route   1      0          0


Thanks,
Miki

--------------------------------------------------
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--------------------------------------------------


On Tue, Jan 19, 2010 at 10:23 AM, Tapan Thapa <tapan.thapa2000 at gmail.com>wrote:

> Hello,
>
> I am not an expert but i have implemented piranha in my setup and it is
> working fine so i am trying to help.
>
> From your configuration it seems that you have not enabled your second
> server.
>
>
> server bweb3.my-domain.com {
>          address = 82.81.215.140
>          *active = 0*
>          weight = 1
>
> Please check this.
>
> Regards
> Tapan Thapa
>
> On Tue, Jan 19, 2010 at 1:40 PM, Michael Ben-Nes <michael at epoch.co.il>wrote:
>
>> Hi,
>>
>> My fresh LVS installation start to malfunction after few hours.
>>
>> I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only
>> static files, not persistent ).
>>
>> When I start the service everything work as expected. After few hours one
>> of the real servers become unavailable ( randomly ) even though ipvsadm show
>> it is ok.
>> The real server which is not answering is continuously been accessed by
>> the nanny process every 6 sec and is accessible through its real IP.
>>
>> The only clue I found is that instead of 2 nanny process there are 4 nanny
>> process ( 2 for each server ).
>> logs show nothing of interest while pulse ruining.
>>
>> When I stop pulse all the related process are terminated beside the 4
>> nannies which I need to kill by hand:
>>
>> Here is a snip from the logs:
>> Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15
>> Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15
>> Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx
>> Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15
>> Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15
>> Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15
>> Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!
>> Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!
>> Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!
>> Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15
>> Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!
>>
>>
>> Here is the data I gathered while the build malfunction:
>>
>> ####### LVS server ( no backup server )
>>
>> piranha-0.8.4-13.el5 - ipvsadm-1.24-10
>>
>> # sysctl net.ipv4.ip_forward
>> net.ipv4.ip_forward = 1
>>
>> # cat /etc/sysconfig/ha/lvs.cf
>> serial_no = 37
>> primary = 82.81.215.137
>> service = lvs
>> backup = 0.0.0.0
>> heartbeat = 1
>> heartbeat_port = 539
>> keepalive = 6
>> deadtime = 18
>> network = direct
>> nat_nmask = 255.255.255.255
>> debug_level = NONE
>> virtual Nginx {
>>      active = 1
>>      address = 82.81.215.141 eth0:1
>>      vip_nmask = 255.255.255.224
>>      port = 80
>>      send = "GET / HTTP/1.0\r\n\r\n"
>>      expect = "HTTP"
>>      use_regex = 0
>>      load_monitor = none
>>      scheduler = wlc  # Suppose to be RR - changed only to test if the
>> scheduler is the problem - same effect
>>      protocol = tcp
>>      timeout = 6
>>      reentry = 15
>>      quiesce_server = 1
>>      server bweb1.my-domain.com {
>>          address = 82.81.215.138
>>          active = 1
>>          weight = 1
>>      }
>>      server bweb2.my-domain.com {
>>          address = 82.81.215.139
>>          active = 1
>>          weight = 1
>>      }
>>      server bweb3.my-domain.com {
>>          address = 82.81.215.140
>>          active = 0
>>          weight = 1
>>      }
>> }
>>
>> # ipvsadm -L -n
>> IP Virtual Server version 1.2.1 (size=4096)
>> Prot LocalAddress:Port Scheduler Flags
>>   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
>> TCP  82.81.215.141:80 wlc
>>   -> 82.81.215.139:80             Route   1      0          0
>>   -> 82.81.215.138:80             Route   1      0          0
>>
>>
>> # ps auxw|egrep "nanny|ipv|lvs|pulse"
>> root      2668  0.0  0.0   8456   692 ?        Ss   Jan16   0:00
>> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
>> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>> root      2670  0.0  0.0   8456   688 ?        Ss   Jan16   0:00
>> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
>> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>> root      5795  0.0  0.0   8488   372 ?        Ss   Jan17   0:00 pulse
>> root      5798  0.0  0.0   8476   656 ?        Ss   Jan17   0:00
>> /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/lvs.cf
>> root      5812  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
>> /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
>> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>> root      5813  0.0  0.0   8456   692 ?        Ss   Jan17   0:00
>> /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x
>> HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs
>>
>>
>>
>> ####### One of the servers ( the one that does not answer. though its
>> identical to the other )
>>
>> # arptables -L -n
>> Chain IN (policy ACCEPT)
>> target     source-ip            destination-ip       source-hw
>>  destination-hw     hlen   op         hrd        pro
>> DROP       0.0.0.0/0            82.81.215.141        00/00
>>  00/00              any    0000/0000  0000/0000  0000/0000
>>
>> Chain OUT (policy ACCEPT)
>> target     source-ip            destination-ip       source-hw
>>  destination-hw     hlen   op         hrd        pro
>> mangle     0.0.0.0/0            82.81.215.141        00/00
>>  00/00              any    0000/0000  0000/0000  0000/0000 --mangle-ip-s
>> 82.81.215.139
>>
>> Chain FORWARD (policy ACCEPT)
>> target     source-ip            destination-ip       source-hw
>>  destination-hw     hlen   op         hrd        pro
>>
>>
>>  # ifconfig
>> eth0      Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
>>           inet addr:82.81.215.139  Bcast:82.81.215.159
>>  Mask:255.255.255.224
>>           inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           RX packets:602454 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:51144864 (48.7 MiB)  TX bytes:251901147 (240.2 MiB)
>>           Interrupt:169 Memory:dcff0000-dd000000
>>
>> eth0:1    Link encap:Ethernet  HWaddr 00:11:25:41:69:A4
>>           inet addr:82.81.215.141  Bcast:82.81.215.159
>>  Mask:255.255.255.224
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           Interrupt:169 Memory:dcff0000-dd000000
>>
>>
>> Thanks for any idea that might shade some light on this topic :)
>>
>> Best,
>> Miki
>>
>> --------------------------------------------------
>> Michael Ben-Nes - Internet Consultant and Director.
>> http://www.epoch.co.il - weaving the Net.
>> Cellular: 054-4848113
>> --------------------------------------------------
>>
>> _______________________________________________
>> Piranha-list mailing list
>> Piranha-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/piranha-list
>>
>
>
> _______________________________________________
> Piranha-list mailing list
> Piranha-list at redhat.com
> https://www.redhat.com/mailman/listinfo/piranha-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20100119/b3125bfb/attachment.htm>

From laszlo at beres.me  Tue Jan 19 09:03:28 2010
From: laszlo at beres.me (Laszlo Beres)
Date: Tue, 19 Jan 2010 10:03:28 +0100
Subject: LVS start to malfunction after few hours
In-Reply-To: <f3be18fa1001190010t65496171v8ec6bc44c7bdbef5@mail.gmail.com>
References: <f3be18fa1001190010t65496171v8ec6bc44c7bdbef5@mail.gmail.com>
Message-ID: <85e375aa1001190103s50b970date9230b4116e68986@mail.gmail.com>

On Tue, Jan 19, 2010 at 9:10 AM, Michael Ben-Nes <michael at epoch.co.il> wrote:

> Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!
> Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!
> Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!
> Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15
> Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!

That reminds me to a bug:

http://bugs.centos.org/view.php?id=3924

Except that in the sample above pulse couldn't even start up properly.

-- 
L?szl? B?res            Unix system engineer
http://www.google.com/profiles/beres.laszlo



From michael at epoch.co.il  Tue Jan 19 09:31:47 2010
From: michael at epoch.co.il (Michael Ben-Nes)
Date: Tue, 19 Jan 2010 11:31:47 +0200
Subject: LVS start to malfunction after few hours
In-Reply-To: <85e375aa1001190103s50b970date9230b4116e68986@mail.gmail.com>
References: <f3be18fa1001190010t65496171v8ec6bc44c7bdbef5@mail.gmail.com>
	<85e375aa1001190103s50b970date9230b4116e68986@mail.gmail.com>
Message-ID: <f3be18fa1001190131h2c39da0dq54f0218550be6dbc@mail.gmail.com>

Thanks Laszlo for the refer.

As mentioned in the bug I downgraded to piranha-0.8.4-11 ( got it from OS
ver 5.3 )
I will return with a feedback in a day, with the hope it will be stable :)

Miki

--------------------------------------------------
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--------------------------------------------------


On Tue, Jan 19, 2010 at 11:03 AM, Laszlo Beres <laszlo at beres.me> wrote:

> On Tue, Jan 19, 2010 at 9:10 AM, Michael Ben-Nes <michael at epoch.co.il>
> wrote:
>
> > Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!
> > Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!
> > Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!
> > Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15
> > Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!
>
> That reminds me to a bug:
>
> http://bugs.centos.org/view.php?id=3924
>
> Except that in the sample above pulse couldn't even start up properly.
>
> --
> L?szl? B?res            Unix system engineer
> http://www.google.com/profiles/beres.laszlo
>
> _______________________________________________
> Piranha-list mailing list
> Piranha-list at redhat.com
> https://www.redhat.com/mailman/listinfo/piranha-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20100119/8a473e0e/attachment.htm>

From dbello at gmail.com  Tue Jan 19 12:51:48 2010
From: dbello at gmail.com (Diego Bello)
Date: Tue, 19 Jan 2010 09:51:48 -0300
Subject: Disabling a Real server
Message-ID: <c61295d91001190451g72a28dbesa6446858e094e7fe@mail.gmail.com>

Hi people,

I have a LVS cluster using CentOS 5.3 and I need, from time to time,
take one of the real servers out for maintenance and then take it in
again. How can I "disable" a real server from the cluster so it is no
longer monitored until It is back again?.

I thought on shutting down the service, Postgresql in this case, but
the monitoring scripts I have send an email alert whenever a Real
Server is not available, and I don't want to be receiving tons of
emails whenever a real server is being updated.

The only way I can think about is to modify the router configuration
file (lvs.cf) and reload pulse, but I'm not totally sure that's the
most elegant choice. I need it to be done automatically using scripts,
but I can't find any hint in man pages of lvsd, nanny or ipvsadm.

Thanks.

-- 
Diego Bello Carre?o



From rhurst at bidmc.harvard.edu  Tue Jan 19 17:41:07 2010
From: rhurst at bidmc.harvard.edu (rhurst at bidmc.harvard.edu)
Date: Tue, 19 Jan 2010 12:41:07 -0500
Subject: Disabling a Real server
In-Reply-To: <c61295d91001190451g72a28dbesa6446858e094e7fe@mail.gmail.com>
References: <c61295d91001190451g72a28dbesa6446858e094e7fe@mail.gmail.com>
Message-ID: <50168EC934B8D64AA8D8DD37F840F3DE055B9019CC@EVS2CCR.its.caregroup.org>

Hopefully, I am understanding your use-case and question ...

If the service goes "offline", the cooresponding pulse -> lvsd -> nanny process is supposed to detect event, and then remove the real server from the vip:port pool.  See syslog for that event messages.

If you want to do that manually, i.e., a planned quiesce of the service, just set the weight factor of the real server to zero.  Monitor its current active clients until it reaches zero, because no new connections will be allowed into that server while it is set to zero.  Then, shutdown the service.

To drain clients, issue a command like:

ipvsadm -e -t eben:8000 -r web02:8000 -g -w 0

... where eben is the virtual host name and web02 is the real server name.

Do the reverse when you are ready to bring it back into the pool: server up, service up, and set its weight factor >0.

-----Original Message-----
From: piranha-list-bounces at redhat.com [mailto:piranha-list-bounces at redhat.com] On Behalf Of Diego Bello
Sent: Tuesday, January 19, 2010 7:52 AM
To: piranha-list at redhat.com
Subject: Disabling a Real server

Hi people,

I have a LVS cluster using CentOS 5.3 and I need, from time to time, take one of the real servers out for maintenance and then take it in again. How can I "disable" a real server from the cluster so it is no longer monitored until It is back again?.

I thought on shutting down the service, Postgresql in this case, but the monitoring scripts I have send an email alert whenever a Real Server is not available, and I don't want to be receiving tons of emails whenever a real server is being updated.

The only way I can think about is to modify the router configuration file (lvs.cf) and reload pulse, but I'm not totally sure that's the most elegant choice. I need it to be done automatically using scripts, but I can't find any hint in man pages of lvsd, nanny or ipvsadm.

Thanks.

--
Diego Bello Carre?o

_______________________________________________
Piranha-list mailing list
Piranha-list at redhat.com
https://www.redhat.com/mailman/listinfo/piranha-list





From dbello at gmail.com  Tue Jan 19 17:56:20 2010
From: dbello at gmail.com (Diego Bello)
Date: Tue, 19 Jan 2010 14:56:20 -0300
Subject: Disabling a Real server
In-Reply-To: <50168EC934B8D64AA8D8DD37F840F3DE055B9019CC@EVS2CCR.its.caregroup.org>
References: <c61295d91001190451g72a28dbesa6446858e094e7fe@mail.gmail.com>
	<50168EC934B8D64AA8D8DD37F840F3DE055B9019CC@EVS2CCR.its.caregroup.org>
Message-ID: <c61295d91001190956s2cfe5b70i89fe4fd044cf4723@mail.gmail.com>

On Tue, Jan 19, 2010 at 2:41 PM,  <rhurst at bidmc.harvard.edu> wrote:
> Hopefully, I am understanding your use-case and question ...
>
> If the service goes "offline", the cooresponding pulse -> lvsd -> nanny process is supposed to detect event, and then remove the real server from the vip:port pool. ?See syslog for that event messages.
>
> If you want to do that manually, i.e., a planned quiesce of the service, just set the weight factor of the real server to zero. ?Monitor its current active clients until it reaches zero, because no new connections will be allowed into that server while it is set to zero. ?Then, shutdown the service.
>
> To drain clients, issue a command like:
>
> ipvsadm -e -t eben:8000 -r web02:8000 -g -w 0
>
> ... where eben is the virtual host name and web02 is the real server name.
>
> Do the reverse when you are ready to bring it back into the pool: server up, service up, and set its weight factor >0.
>

This is what I do, but when one server has weight = 0, the monitoring
script keeps working on it. It is still part of the pool. This
monitoring script is used, in this case, to alert when a machine is
down, so even when I set its weight to 0 and shutdown postgres, alerts
keep telling me that the machine is down.


> -----Original Message-----
> From: piranha-list-bounces at redhat.com [mailto:piranha-list-bounces at redhat.com] On Behalf Of Diego Bello
> Sent: Tuesday, January 19, 2010 7:52 AM
> To: piranha-list at redhat.com
> Subject: Disabling a Real server
>
> Hi people,
>
> I have a LVS cluster using CentOS 5.3 and I need, from time to time, take one of the real servers out for maintenance and then take it in again. How can I "disable" a real server from the cluster so it is no longer monitored until It is back again?.
>
> I thought on shutting down the service, Postgresql in this case, but the monitoring scripts I have send an email alert whenever a Real Server is not available, and I don't want to be receiving tons of emails whenever a real server is being updated.
>
> The only way I can think about is to modify the router configuration file (lvs.cf) and reload pulse, but I'm not totally sure that's the most elegant choice. I need it to be done automatically using scripts, but I can't find any hint in man pages of lvsd, nanny or ipvsadm.
>
> Thanks.
>
> --
> Diego Bello Carre?o
>
> _______________________________________________
> Piranha-list mailing list
> Piranha-list at redhat.com
> https://www.redhat.com/mailman/listinfo/piranha-list
>
>
>
> _______________________________________________
> Piranha-list mailing list
> Piranha-list at redhat.com
> https://www.redhat.com/mailman/listinfo/piranha-list
>



-- 
Diego Bello Carre?o



From michael at epoch.co.il  Sun Jan 24 10:40:02 2010
From: michael at epoch.co.il (Michael Ben-Nes)
Date: Sun, 24 Jan 2010 12:40:02 +0200
Subject: LVS start to malfunction after few hours
In-Reply-To: <f3be18fa1001190131h2c39da0dq54f0218550be6dbc@mail.gmail.com>
References: <f3be18fa1001190010t65496171v8ec6bc44c7bdbef5@mail.gmail.com>
	<85e375aa1001190103s50b970date9230b4116e68986@mail.gmail.com>
	<f3be18fa1001190131h2c39da0dq54f0218550be6dbc@mail.gmail.com>
Message-ID: <f3be18fa1001240240xc7bb50et964938afa427026d@mail.gmail.com>

Ok, this is the current status.

The solution that uses arptables + the VIP on eth0:1 is not working for me
with LVS-DR. After few hours of service, the LVS breaks and
I receive answers only from one server.

I checked using /sbin/arping if ARP packets pass the arptable fw
but couldn't find any.
After some debates I choose a different setup then the one mentioned by RH
docs.

Removed / stopped the arptable package.

Added to sysctl.conf:

net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.eth0.arp_announce = 2

Moved the VIP from eth:0 to lo:0

Now its working as expected.

note - the bug mentioned at RH bugzilla is not relevant as it
address different problem.


--------------------------------------------------
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--------------------------------------------------


On Tue, Jan 19, 2010 at 11:31 AM, Michael Ben-Nes <michael at epoch.co.il>wrote:

> Thanks Laszlo for the refer.
>
> As mentioned in the bug I downgraded to piranha-0.8.4-11 ( got it from OS
> ver 5.3 )
> I will return with a feedback in a day, with the hope it will be stable :)
>
> Miki
>
> --------------------------------------------------
> Michael Ben-Nes - Internet Consultant and Director.
> http://www.epoch.co.il - weaving the Net.
> Cellular: 054-4848113
> --------------------------------------------------
>
>
> On Tue, Jan 19, 2010 at 11:03 AM, Laszlo Beres <laszlo at beres.me> wrote:
>
>> On Tue, Jan 19, 2010 at 9:10 AM, Michael Ben-Nes <michael at epoch.co.il>
>> wrote:
>>
>> > Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!
>> > Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!
>> > Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!
>> > Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15
>> > Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!
>>
>> That reminds me to a bug:
>>
>> http://bugs.centos.org/view.php?id=3924
>>
>> Except that in the sample above pulse couldn't even start up properly.
>>
>> --
>> L?szl? B?res            Unix system engineer
>> http://www.google.com/profiles/beres.laszlo
>>
>> _______________________________________________
>> Piranha-list mailing list
>> Piranha-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/piranha-list
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/piranha-list/attachments/20100124/85db88c4/attachment.htm>