lvsd kills off all nannies!

Dan Yocum yocum at fnal.gov
Wed Jun 10 20:58:09 UTC 2009


I just had the same experience again when attempting to add another 
service to our LVS director and reloading pulse, so upgrading to 
piranha-0.8.4-11.el5 did not help.

One thing that I noticed was that the monitor process to one real 
servers failed right away (the service on that system was actually 
down).  I think this caused the nanny to falter which brought everything 
down, too.  Not good.

Here's what I saw in /var/log/messages:

lvs[19604]: rereading configuration file
lvs[19604]: create_monitor for saz-admin:8443/fg5x3 running as pid 31729
lvs[19604]: create_monitor for saz-admin:8443/fg6x3 running as pid 31730
lvs[19604]: nanny for child saz-admin:8443/fg5x3 died! shutting down lvs
lvs[19604]: shutting down virtual service MYSQL:3306
lvs[19604]: shutting down virtual service SAZ:8888
lvs[19604]: shutting down virtual service SAZ:8881
lvs[19604]: shutting down virtual service SAZ:8882
lvs[19604]: shutting down virtual service voms:8443
lvs[19604]: shutting down virtual service voms-osg:8443
lvs[19604]: shutting down virtual service gums:8443
nanny[19614]: Terminating due to signal 15
nanny[19617]: Terminating due to signal 15
nanny[19622]: Terminating due to signal 15
nanny[19644]: Terminating due to signal 15
nanny[19645]: Terminating due to signal 15
nanny[19647]: Terminating due to signal 15
etc.

Thanks,
Dan



Dan Yocum wrote:
> Hi Barry,
> 
> We're on piranha-0.8.4-9.3.el5.  I will upgrade to release 11 and see if 
> that helps.
> 
> Thanks,
> Dan
> 
> 
> Barry Brimer wrote:
>> Quoting Dan Yocum <yocum at fnal.gov>:
>>
>>> Hi all,
>>>
>>> Here's the situation we're running into - after setting a real server to
>>> active = 0 and weight = 0 and reloading pulse, <perform some work on the
>>> RS>, set active = 1 and weight = 3 and reloading pulse, lvsd first
>>> creates the monitor for the process, which dies for some strange reason,
>>> then proceeds to shutdown *all* virtual services!!
>>>
>>> Here's what I see in /var/log/messages:
>>
>> <snip>
>>> Performing a 'service pulse restart' brings everything back online just
>>> fine.
>>>
>>> What's going on here?
>>>
>>> The OS is Scientific Linux 5.2 (i.e., RHELv5.2) on a Xen VM, kernel
>>> 2.6.18-128.1.6.el5xen.
>>
>> What version of piranha do you have installed?  The latest version 
>> seems to
>> correct some nanny/pulse related issues
>> <https://rhn.redhat.com/errata/RHBA-2009-0095.html>
>>
>> Barry
>>
>> _______________________________________________
>> Piranha-list mailing list
>> Piranha-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/piranha-list
> 

-- 
Dan Yocum
Fermilab  630.840.6509
yocum at fnal.gov, http://fermigrid.fnal.gov
Fermilab.  Just zeros and ones.




More information about the Piranha-list mailing list