From yocum at fnal.gov Wed May 6 16:56:22 2009 From: yocum at fnal.gov (Dan Yocum) Date: Wed, 06 May 2009 11:56:22 -0500 Subject: lvsd kills off all nannies! Message-ID: <4A01C136.3040404@fnal.gov> Hi all, Here's the situation we're running into - after setting a real server to active = 0 and weight = 0 and reloading pulse, , set active = 1 and weight = 3 and reloading pulse, lvsd first creates the monitor for the process, which dies for some strange reason, then proceeds to shutdown *all* virtual services!! Here's what I see in /var/log/messages: lvs[2821]: rereading configuration file lvs[2821]: create_monitor for squid:3128/fg3x3.fnal.gov running as pid 13633 lvs[2821]: nanny for child squid:3128/fg3x3.fnal.gov died! shutting down lvs lvs[2821]: shutting down virtual service MYSQL:3306 lvs[2821]: shutting down virtual service SAZ:8888 lvs[2821]: shutting down virtual service SAZ:8881 lvs[2821]: shutting down virtual service SAZ:8882 lvs[2821]: shutting down virtual service voms:8443 lvs[2821]: shutting down virtual service voms-osg:8443 lvs[2821]: shutting down virtual service gums:8443 lvs[2821]: shutting down virtual service voms-auger:15007 nanny[2854]: Terminating due to signal 15 nanny[2858]: Terminating due to signal 15 nanny[2865]: Terminating due to signal 15 nanny[2867]: Terminating due to signal 15 nanny[2868]: Terminating due to signal 15 nanny[2878]: Terminating due to signal 15 nanny[2888]: Terminating due to signal 15 nanny[2906]: Terminating due to signal 15 nanny[2910]: Terminating due to signal 15 nanny[2921]: Terminating due to signal 15 nanny[16558]: Terminating due to signal 15 nanny[16561]: Terminating due to signal 15 nanny[16562]: Terminating due to signal 15 nanny[16563]: Terminating due to signal 15 nanny[16566]: Terminating due to signal 15 nanny[16588]: Terminating due to signal 15 nanny[16592]: Terminating due to signal 15 nanny[16593]: Terminating due to signal 15 lvs[2821]: shutting down virtual service voms-cdf:15020 lvs[2821]: shutting down virtual service voms-cms:15015 lvs[2821]: shutting down virtual service voms-des:15017 lvs[2821]: shutting down virtual service voms-dzero:15002 lvs[2821]: shutting down virtual service voms-fermilab:15001 lvs[2821]: shutting down virtual service voms-i2u2:15026 lvs[2821]: shutting down virtual service voms-ilc:15023 lvs[2821]: shutting down virtual service voms-lqcd:15024 lvs[2821]: shutting down virtual service voms-nanohub:15022 lvs[2821]: shutting down virtual service voms-jdem:15028 nanny[2924]: Terminating due to signal 15 nanny[2938]: Terminating due to signal 15 nanny[2941]: Terminating due to signal 15 nanny[2955]: Terminating due to signal 15 nanny[2958]: Terminating due to signal 15 nanny[2971]: Terminating due to signal 15 nanny[2974]: Terminating due to signal 15 nanny[2989]: Terminating due to signal 15 nanny[2992]: Terminating due to signal 15 nanny[3003]: Terminating due to signal 15 nanny[3005]: Terminating due to signal 15 nanny[16594]: Terminating due to signal 15 nanny[16595]: Terminating due to signal 15 nanny[16604]: Terminating due to signal 15 nanny[16605]: Terminating due to signal 15 nanny[16606]: Terminating due to signal 15 nanny[16607]: Terminating due to signal 15 nanny[16608]: Terminating due to signal 15 nanny[16618]: Terminating due to signal 15 nanny[16619]: Terminating due to signal 15 nanny[16620]: Terminating due to signal 15 nanny[13633]: starting LVS client monitor for 131.225.107.161:3128 nanny[13633]: making 131.225.107.144:3128 available nanny[13633]: /sbin/ipvsadm command failed! lvs[2821]: shutting down virtual service voms-osg:15027 lvs[2821]: shutting down virtual service squid:3128 Performing a 'service pulse restart' brings everything back online just fine. What's going on here? The OS is Scientific Linux 5.2 (i.e., RHELv5.2) on a Xen VM, kernel 2.6.18-128.1.6.el5xen. Thanks, Dan -- Dan Yocum Fermilab 630.840.6509 yocum at fnal.gov, http://fermigrid.fnal.gov Fermilab. Just zeros and ones. From lists at brimer.org Wed May 6 17:09:21 2009 From: lists at brimer.org (Barry Brimer) Date: Wed, 06 May 2009 12:09:21 -0500 Subject: lvsd kills off all nannies! In-Reply-To: <4A01C136.3040404@fnal.gov> References: <4A01C136.3040404@fnal.gov> Message-ID: <1241629761.4a01c44153901@mail.toucanhost.com> Quoting Dan Yocum : > Hi all, > > Here's the situation we're running into - after setting a real server to > active = 0 and weight = 0 and reloading pulse, RS>, set active = 1 and weight = 3 and reloading pulse, lvsd first > creates the monitor for the process, which dies for some strange reason, > then proceeds to shutdown *all* virtual services!! > > Here's what I see in /var/log/messages: > > Performing a 'service pulse restart' brings everything back online just > fine. > > What's going on here? > > The OS is Scientific Linux 5.2 (i.e., RHELv5.2) on a Xen VM, kernel > 2.6.18-128.1.6.el5xen. What version of piranha do you have installed? The latest version seems to correct some nanny/pulse related issues Barry From yocum at fnal.gov Wed May 6 17:42:34 2009 From: yocum at fnal.gov (Dan Yocum) Date: Wed, 06 May 2009 12:42:34 -0500 Subject: lvsd kills off all nannies! In-Reply-To: <1241629761.4a01c44153901@mail.toucanhost.com> References: <4A01C136.3040404@fnal.gov> <1241629761.4a01c44153901@mail.toucanhost.com> Message-ID: <4A01CC0A.20903@fnal.gov> Hi Barry, We're on piranha-0.8.4-9.3.el5. I will upgrade to release 11 and see if that helps. Thanks, Dan Barry Brimer wrote: > Quoting Dan Yocum : > >> Hi all, >> >> Here's the situation we're running into - after setting a real server to >> active = 0 and weight = 0 and reloading pulse, > RS>, set active = 1 and weight = 3 and reloading pulse, lvsd first >> creates the monitor for the process, which dies for some strange reason, >> then proceeds to shutdown *all* virtual services!! >> >> Here's what I see in /var/log/messages: > > >> Performing a 'service pulse restart' brings everything back online just >> fine. >> >> What's going on here? >> >> The OS is Scientific Linux 5.2 (i.e., RHELv5.2) on a Xen VM, kernel >> 2.6.18-128.1.6.el5xen. > > What version of piranha do you have installed? The latest version seems to > correct some nanny/pulse related issues > > > Barry > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list -- Dan Yocum Fermilab 630.840.6509 yocum at fnal.gov, http://fermigrid.fnal.gov Fermilab. Just zeros and ones.