<div dir="ltr"><span style="font-family:arial, sans-serif;font-size:13px;border-collapse:collapse"><div>Hi,</div><div><br></div><div>My fresh LVS installation start to malfunction after few hours.</div>
<div><br></div><div>I use piranha / pulse on CentOS 5.4 to RR between two Nginx server ( only static files, not persistent ).</div><div><br></div><div>When I start the service everything work as expected. After few hours one of the real servers become unavailable ( randomly ) even though ipvsadm show it is ok.</div>
<div>The real server which is not answering is continuously been accessed by the nanny process every 6 sec and is accessible through its real IP.</div><div><br></div><div>The only clue I found is that instead of 2 nanny process there are 4 nanny process ( 2 for each server ).</div>
<div>logs show nothing of interest while pulse ruining.<br></div><div><br></div><div>When I stop pulse all the related process are terminated beside the 4 nannies which I need to kill by hand:</div><div><br></div><div>Here is a snip from the logs:</div>
<div><div>Jan 18 09:57:21 blb1 pulse[5795]: Terminating due to signal 15</div><div>Jan 18 09:57:21 blb1 lvs[5798]: shutting down due to signal 15</div><div>Jan 18 09:57:21 blb1 lvs[5798]: shutting down virtual service Nginx</div>
<div>Jan 18 09:59:15 blb1 nanny[2668]: Terminating due to signal 15</div><div>Jan 18 09:59:15 blb1 nanny[2670]: Terminating due to signal 15</div><div>Jan 18 09:59:15 blb1 nanny[5812]: Terminating due to signal 15</div><div>
Jan 18 09:59:15 blb1 nanny[2668]: /sbin/ipvsadm command failed!</div><div>Jan 18 09:59:15 blb1 nanny[5812]: /sbin/ipvsadm command failed!</div><div>Jan 18 09:59:15 blb1 nanny[2670]: /sbin/ipvsadm command failed!</div><div>
Jan 18 09:59:15 blb1 nanny[5813]: Terminating due to signal 15</div><div>Jan 18 09:59:15 blb1 nanny[5813]: /sbin/ipvsadm command failed!</div><div><br></div></div><div><br></div><div>Here is the data I gathered while the build malfunction:</div>
<div><br></div><div>####### LVS server ( no backup server )</div><div><br></div><div>piranha-0.8.4-13.el5 - ipvsadm-1.24-10</div><div><br></div><div><div># sysctl net.ipv4.ip_forward</div><div>net.ipv4.ip_forward = 1</div>
<div><br></div></div><div># cat /etc/sysconfig/ha/<a href="http://lvs.cf" style="color:rgb(42, 93, 176)" target="_blank">lvs.cf</a> </div><div>serial_no = 37</div><div>primary = 82.81.215.137</div><div>service = lvs</div>
<div>backup = 0.0.0.0</div><div>heartbeat = 1</div><div>heartbeat_port = 539</div><div>keepalive = 6</div><div>deadtime = 18</div><div>network = direct</div><div>nat_nmask = 255.255.255.255</div><div>debug_level = NONE</div>
<div>virtual Nginx {</div><div> active = 1</div><div> address = 82.81.215.141 eth0:1</div><div> vip_nmask = 255.255.255.224</div><div> port = 80</div><div> send = "GET / HTTP/1.0\r\n\r\n"</div>
<div> expect = "HTTP"</div><div> use_regex = 0</div><div> load_monitor = none</div><div> scheduler = wlc # Suppose to be RR - changed only to test if the scheduler is the problem - same effect</div>
<div> protocol = tcp</div><div> timeout = 6</div><div> reentry = 15</div><div> quiesce_server = 1</div><div> server <a href="http://bweb1.my-domain.com" style="color:rgb(42, 93, 176)" target="_blank">bweb1.my-domain.com</a> {</div>
<div> address = 82.81.215.138</div><div> active = 1</div><div> weight = 1</div><div> }</div><div> server <a href="http://bweb2.my-domain.com" style="color:rgb(42, 93, 176)" target="_blank">bweb2.my-domain.com</a> {</div>
<div> address = 82.81.215.139</div><div> active = 1</div><div> weight = 1</div><div> }</div><div> server <a href="http://bweb3.my-domain.com" style="color:rgb(42, 93, 176)" target="_blank">bweb3.my-domain.com</a> {</div>
<div> address = 82.81.215.140</div><div> active = 0</div><div> weight = 1</div><div> }</div><div>}</div><div><br></div><div># ipvsadm -L -n</div><div>IP Virtual Server version 1.2.1 (size=4096)</div>
<div>Prot LocalAddress:Port Scheduler Flags</div><div> -> RemoteAddress:Port Forward Weight ActiveConn InActConn</div><div>TCP <a href="http://82.81.215.141:80" style="color:rgb(42, 93, 176)" target="_blank">82.81.215.141:80</a> wlc</div>
<div> -> <a href="http://82.81.215.139:80" style="color:rgb(42, 93, 176)" target="_blank">82.81.215.139:80</a> Route 1 0 0 </div><div> -> <a href="http://82.81.215.138:80" style="color:rgb(42, 93, 176)" target="_blank">82.81.215.138:80</a> Route 1 0 0 </div>
<div><br></div><div><br></div><div># ps auxw|egrep "nanny|ipv|lvs|pulse"</div><div>root 2668 0.0 0.0 8456 692 ? Ss Jan16 0:00 /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs</div>
<div>root 2670 0.0 0.0 8456 688 ? Ss Jan16 0:00 /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs</div>
<div>root 5795 0.0 0.0 8488 372 ? Ss Jan17 0:00 pulse</div><div>root 5798 0.0 0.0 8476 656 ? Ss Jan17 0:00 /usr/sbin/lvsd --nofork -c /etc/sysconfig/ha/<a href="http://lvs.cf" style="color:rgb(42, 93, 176)" target="_blank">lvs.cf</a></div>
<div>root 5812 0.0 0.0 8456 692 ? Ss Jan17 0:00 /usr/sbin/nanny -c -h 82.81.215.138 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs</div>
<div>root 5813 0.0 0.0 8456 692 ? Ss Jan17 0:00 /usr/sbin/nanny -c -h 82.81.215.139 -p 80 -r 80 -s GET / HTTP/1.0\r\n\r\n -x HTTP -q -a 15 -I /sbin/ipvsadm -t 6 -w 1 -V 82.81.215.141 -M g -U none --lvs</div>
<div><br></div><div><br></div><div><br></div><div>####### One of the servers ( the one that does not answer. though its identical to the other )</div><div><br></div><div># arptables -L -n</div><div>Chain IN (policy ACCEPT)</div>
<div>target source-ip destination-ip source-hw destination-hw hlen op hrd pro </div><div>DROP <a href="http://0.0.0.0/0" style="color:rgb(42, 93, 176)" target="_blank">0.0.0.0/0</a> 82.81.215.141 00/00 00/00 any 0000/0000 0000/0000 0000/0000 </div>
<div><br></div><div>Chain OUT (policy ACCEPT)</div><div>target source-ip destination-ip source-hw destination-hw hlen op hrd pro </div><div>mangle <a href="http://0.0.0.0/0" style="color:rgb(42, 93, 176)" target="_blank">0.0.0.0/0</a> 82.81.215.141 00/00 00/00 any 0000/0000 0000/0000 0000/0000 --mangle-ip-s 82.81.215.139 </div>
<div><br></div><div>Chain FORWARD (policy ACCEPT)</div><div>target source-ip destination-ip source-hw destination-hw hlen op hrd pro </div><div><br></div><div><br>
</div>
<div># ifconfig </div><div>eth0 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 </div><div> inet addr:82.81.215.139 Bcast:82.81.215.159 Mask:255.255.255.224</div><div> inet6 addr: fe80::211:25ff:fe41:69a4/64 Scope:Link</div>
<div> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</div><div> RX packets:602454 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:514536 errors:0 dropped:0 overruns:0 carrier:0</div>
<div> collisions:0 txqueuelen:1000 </div><div> RX bytes:51144864 (48.7 MiB) TX bytes:251901147 (240.2 MiB)</div><div> Interrupt:169 Memory:dcff0000-dd000000 </div><div><br></div><div>eth0:1 Link encap:Ethernet HWaddr 00:11:25:41:69:A4 </div>
<div> inet addr:82.81.215.141 Bcast:82.81.215.159 Mask:255.255.255.224</div><div> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</div><div> Interrupt:169 Memory:dcff0000-dd000000 </div><div>
<br></div><div><br></div><div>Thanks for any idea that might shade some light on this topic :)</div><div><br></div><div>Best,</div><div>Miki</div><div><br></div></span>--------------------------------------------------<br>
Michael Ben-Nes - Internet Consultant and Director.<br><a href="http://www.epoch.co.il" target="_blank">http://www.epoch.co.il</a> - weaving the Net.<br>Cellular: 054-4848113<br>--------------------------------------------------<br>
</div>