From za.it.services at gmail.com Mon Jun 8 10:24:45 2009 From: za.it.services at gmail.com (R R) Date: Mon, 8 Jun 2009 12:24:45 +0200 Subject: Piranha server crash Message-ID: Hi Folks, I setup a LVS server using piranha. After a week the server became completely unresponsive , I couldn't even ping the server IP address. After I started to using piranha, my graphics showed high usage of the RAM (the server has 1G) ; however there were no other error/warning messages found on file logs. I'm using RHEL 4.3, 64bits ( kernel 2.6.9-34.EL), ipvsadm-1.24-6, piranha-0.8.3.1-3.; lvs-rrd for graphics. The amount of virtual servers are 24 (12 HTTP - 12 HTTPS) with 2 real servers each. Has anyone experienced similar issues? Please advise. PS: does anyone know of any other tool to get graphics for LVS connections. Thanks == Bigo -------------- next part -------------- An HTML attachment was scrubbed... URL: From hirantha at securedpipe.net Mon Jun 8 18:00:51 2009 From: hirantha at securedpipe.net (hirantha) Date: Mon, 08 Jun 2009 23:30:51 +0530 Subject: Piranha server crash In-Reply-To: References: Message-ID: <4A2D51D3.9090607@securedpipe.net> Hi, I've been using piranha for almost 2 years but I don't start the piranha daemon once the initial config has been done. I just manually configured the lvs.cf and use pulse daemon to apply config to ipvsadm and nanny to operate. Piranha daemon is not really necessary to running if you know how LVS works and administrate through ipvsadm. You can shutdown piranha and give a try to check whether your lvs server resources are stable. Hope this is helpful. Regards Hirantha R R wrote: > Hi Folks, > > I setup a LVS server using piranha. After a week the server became > completely unresponsive , I couldn't even ping the server IP address. > > After I started to using piranha, my graphics showed high usage of the > RAM (the server has 1G) ; however there were no other error/warning > messages found on file logs. > > I'm using RHEL 4.3, 64bits ( kernel 2.6.9-34.EL), ipvsadm-1.24-6, > piranha-0.8.3.1-3.; lvs-rrd for graphics. > > The amount of virtual servers are 24 (12 HTTP - 12 HTTPS) with 2 real > servers each. > > Has anyone experienced similar issues? > > Please advise. > > PS: does anyone know of any other tool to get graphics for LVS connections. > > > Thanks > > == > > Bigo > > > ------------------------------------------------------------------------ > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list From yocum at fnal.gov Wed Jun 10 20:58:09 2009 From: yocum at fnal.gov (Dan Yocum) Date: Wed, 10 Jun 2009 15:58:09 -0500 Subject: lvsd kills off all nannies! In-Reply-To: <4A01CC0A.20903@fnal.gov> References: <4A01C136.3040404@fnal.gov> <1241629761.4a01c44153901@mail.toucanhost.com> <4A01CC0A.20903@fnal.gov> Message-ID: <4A301E61.9070307@fnal.gov> I just had the same experience again when attempting to add another service to our LVS director and reloading pulse, so upgrading to piranha-0.8.4-11.el5 did not help. One thing that I noticed was that the monitor process to one real servers failed right away (the service on that system was actually down). I think this caused the nanny to falter which brought everything down, too. Not good. Here's what I saw in /var/log/messages: lvs[19604]: rereading configuration file lvs[19604]: create_monitor for saz-admin:8443/fg5x3 running as pid 31729 lvs[19604]: create_monitor for saz-admin:8443/fg6x3 running as pid 31730 lvs[19604]: nanny for child saz-admin:8443/fg5x3 died! shutting down lvs lvs[19604]: shutting down virtual service MYSQL:3306 lvs[19604]: shutting down virtual service SAZ:8888 lvs[19604]: shutting down virtual service SAZ:8881 lvs[19604]: shutting down virtual service SAZ:8882 lvs[19604]: shutting down virtual service voms:8443 lvs[19604]: shutting down virtual service voms-osg:8443 lvs[19604]: shutting down virtual service gums:8443 nanny[19614]: Terminating due to signal 15 nanny[19617]: Terminating due to signal 15 nanny[19622]: Terminating due to signal 15 nanny[19644]: Terminating due to signal 15 nanny[19645]: Terminating due to signal 15 nanny[19647]: Terminating due to signal 15 etc. Thanks, Dan Dan Yocum wrote: > Hi Barry, > > We're on piranha-0.8.4-9.3.el5. I will upgrade to release 11 and see if > that helps. > > Thanks, > Dan > > > Barry Brimer wrote: >> Quoting Dan Yocum : >> >>> Hi all, >>> >>> Here's the situation we're running into - after setting a real server to >>> active = 0 and weight = 0 and reloading pulse, >> RS>, set active = 1 and weight = 3 and reloading pulse, lvsd first >>> creates the monitor for the process, which dies for some strange reason, >>> then proceeds to shutdown *all* virtual services!! >>> >>> Here's what I see in /var/log/messages: >> >> >>> Performing a 'service pulse restart' brings everything back online just >>> fine. >>> >>> What's going on here? >>> >>> The OS is Scientific Linux 5.2 (i.e., RHELv5.2) on a Xen VM, kernel >>> 2.6.18-128.1.6.el5xen. >> >> What version of piranha do you have installed? The latest version >> seems to >> correct some nanny/pulse related issues >> >> >> Barry >> >> _______________________________________________ >> Piranha-list mailing list >> Piranha-list at redhat.com >> https://www.redhat.com/mailman/listinfo/piranha-list > -- Dan Yocum Fermilab 630.840.6509 yocum at fnal.gov, http://fermigrid.fnal.gov Fermilab. Just zeros and ones. From za.it.services at gmail.com Thu Jun 11 21:21:08 2009 From: za.it.services at gmail.com (R R) Date: Thu, 11 Jun 2009 23:21:08 +0200 Subject: Piranha-list Digest, Vol 38, Issue 3 In-Reply-To: <20090611160028.94C08618971@hormel.redhat.com> References: <20090611160028.94C08618971@hormel.redhat.com> Message-ID: Hi, Could you please send your lvs.cf file? -- Bigo On Thu, Jun 11, 2009 at 6:00 PM, wrote: > Send Piranha-list mailing list submissions to > piranha-list at redhat.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://www.redhat.com/mailman/listinfo/piranha-list > or, via email, send a message with subject or body 'help' to > piranha-list-request at redhat.com > > You can reach the person managing the list at > piranha-list-owner at redhat.com > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Piranha-list digest..." > > Today's Topics: > > 1. Re: lvsd kills off all nannies! (Dan Yocum) > > > ---------- Forwarded message ---------- > From: Dan Yocum > To: Piranha clustering/HA technology > Date: Wed, 10 Jun 2009 15:58:09 -0500 > Subject: Re: lvsd kills off all nannies! > I just had the same experience again when attempting to add another service > to our LVS director and reloading pulse, so upgrading to > piranha-0.8.4-11.el5 did not help. > > One thing that I noticed was that the monitor process to one real servers > failed right away (the service on that system was actually down). I think > this caused the nanny to falter which brought everything down, too. Not > good. > > Here's what I saw in /var/log/messages: > > lvs[19604]: rereading configuration file > lvs[19604]: create_monitor for saz-admin:8443/fg5x3 running as pid 31729 > lvs[19604]: create_monitor for saz-admin:8443/fg6x3 running as pid 31730 > lvs[19604]: nanny for child saz-admin:8443/fg5x3 died! shutting down lvs > lvs[19604]: shutting down virtual service MYSQL:3306 > lvs[19604]: shutting down virtual service SAZ:8888 > lvs[19604]: shutting down virtual service SAZ:8881 > lvs[19604]: shutting down virtual service SAZ:8882 > lvs[19604]: shutting down virtual service voms:8443 > lvs[19604]: shutting down virtual service voms-osg:8443 > lvs[19604]: shutting down virtual service gums:8443 > nanny[19614]: Terminating due to signal 15 > nanny[19617]: Terminating due to signal 15 > nanny[19622]: Terminating due to signal 15 > nanny[19644]: Terminating due to signal 15 > nanny[19645]: Terminating due to signal 15 > nanny[19647]: Terminating due to signal 15 > etc. > > Thanks, > Dan > > > > Dan Yocum wrote: > >> Hi Barry, >> >> We're on piranha-0.8.4-9.3.el5. I will upgrade to release 11 and see if >> that helps. >> >> Thanks, >> Dan >> >> >> Barry Brimer wrote: >> >>> Quoting Dan Yocum : >>> >>> Hi all, >>>> >>>> Here's the situation we're running into - after setting a real server to >>>> active = 0 and weight = 0 and reloading pulse, >>> RS>, set active = 1 and weight = 3 and reloading pulse, lvsd first >>>> creates the monitor for the process, which dies for some strange reason, >>>> then proceeds to shutdown *all* virtual services!! >>>> >>>> Here's what I see in /var/log/messages: >>>> >>> >>> >>> >>>> Performing a 'service pulse restart' brings everything back online just >>>> fine. >>>> >>>> What's going on here? >>>> >>>> The OS is Scientific Linux 5.2 (i.e., RHELv5.2) on a Xen VM, kernel >>>> 2.6.18-128.1.6.el5xen. >>>> >>> >>> What version of piranha do you have installed? The latest version seems >>> to >>> correct some nanny/pulse related issues >>> >>> >>> Barry >>> >>> _______________________________________________ >>> Piranha-list mailing list >>> Piranha-list at redhat.com >>> https://www.redhat.com/mailman/listinfo/piranha-list >>> >> >> > -- > Dan Yocum > Fermilab 630.840.6509 > yocum at fnal.gov, http://fermigrid.fnal.gov > Fermilab. Just zeros and ones. > > > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yocum at fnal.gov Fri Jun 12 13:57:05 2009 From: yocum at fnal.gov (Dan Yocum) Date: Fri, 12 Jun 2009 08:57:05 -0500 Subject: Piranha-list Digest, Vol 38, Issue 3 In-Reply-To: References: <20090611160028.94C08618971@hormel.redhat.com> Message-ID: <4A325EB1.8080703@fnal.gov> It's here, in all it's gory detail: http://home.fnal.gov/~yocum/lvs.cf Thanks, Dan R R wrote: > Hi, > > Could you please send your lvs.cf file? > > -- > > Bigo > > On Thu, Jun 11, 2009 at 6:00 PM, > wrote: > > Send Piranha-list mailing list submissions to > piranha-list at redhat.com > > To subscribe or unsubscribe via the World Wide Web, visit > https://www.redhat.com/mailman/listinfo/piranha-list > or, via email, send a message with subject or body 'help' to > piranha-list-request at redhat.com > > > You can reach the person managing the list at > piranha-list-owner at redhat.com > > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Piranha-list digest..." > > Today's Topics: > > 1. Re: lvsd kills off all nannies! (Dan Yocum) > > > ---------- Forwarded message ---------- > From: Dan Yocum > > To: Piranha clustering/HA technology > > Date: Wed, 10 Jun 2009 15:58:09 -0500 > Subject: Re: lvsd kills off all nannies! > I just had the same experience again when attempting to add another > service to our LVS director and reloading pulse, so upgrading to > piranha-0.8.4-11.el5 did not help. > > One thing that I noticed was that the monitor process to one real > servers failed right away (the service on that system was actually > down). I think this caused the nanny to falter which brought > everything down, too. Not good. > > Here's what I saw in /var/log/messages: > > lvs[19604]: rereading configuration file > lvs[19604]: create_monitor for saz-admin:8443/fg5x3 running as pid 31729 > lvs[19604]: create_monitor for saz-admin:8443/fg6x3 running as pid 31730 > lvs[19604]: nanny for child saz-admin:8443/fg5x3 died! shutting down lvs > lvs[19604]: shutting down virtual service MYSQL:3306 > lvs[19604]: shutting down virtual service SAZ:8888 > lvs[19604]: shutting down virtual service SAZ:8881 > lvs[19604]: shutting down virtual service SAZ:8882 > lvs[19604]: shutting down virtual service voms:8443 > lvs[19604]: shutting down virtual service voms-osg:8443 > lvs[19604]: shutting down virtual service gums:8443 > nanny[19614]: Terminating due to signal 15 > nanny[19617]: Terminating due to signal 15 > nanny[19622]: Terminating due to signal 15 > nanny[19644]: Terminating due to signal 15 > nanny[19645]: Terminating due to signal 15 > nanny[19647]: Terminating due to signal 15 > etc. > > Thanks, > Dan > > > > Dan Yocum wrote: > > Hi Barry, > > We're on piranha-0.8.4-9.3.el5. I will upgrade to release 11 > and see if that helps. > > Thanks, > Dan > > > Barry Brimer wrote: > > Quoting Dan Yocum >: > > Hi all, > > Here's the situation we're running into - after setting > a real server to > active = 0 and weight = 0 and reloading pulse, some work on the > RS>, set active = 1 and weight = 3 and reloading pulse, > lvsd first > creates the monitor for the process, which dies for some > strange reason, > then proceeds to shutdown *all* virtual services!! > > Here's what I see in /var/log/messages: > > > > > Performing a 'service pulse restart' brings everything > back online just > fine. > > What's going on here? > > The OS is Scientific Linux 5.2 (i.e., RHELv5.2) on a Xen > VM, kernel > 2.6.18-128.1.6.el5xen. > > > What version of piranha do you have installed? The latest > version seems to > correct some nanny/pulse related issues > > > Barry > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list > > > > -- > Dan Yocum > Fermilab 630.840.6509 > yocum at fnal.gov , http://fermigrid.fnal.gov > Fermilab. Just zeros and ones. > > > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Piranha-list mailing list > Piranha-list at redhat.com > https://www.redhat.com/mailman/listinfo/piranha-list -- Dan Yocum Fermilab 630.840.6509 yocum at fnal.gov, http://fermigrid.fnal.gov Fermilab. Just zeros and ones. From joerg.kost at mivitec.de Fri Jun 19 12:30:16 2009 From: joerg.kost at mivitec.de (=?ISO-8859-1?Q?J=F6rg_Kost?=) Date: Fri, 19 Jun 2009 14:30:16 +0200 Subject: lvsd kills off all nannies! Message-ID: Hi Dan, i was hitting the same problem and looked inside. The situation is: When any children of nanny die for any reason, the lvs-daemon is shutting down completely. As a simple and fast workaround i changed the source code so lvsd alerts the regular syslog process and does not kill the complete services. See the attached patch file. Be sure to have a good monitoring service for this kind of errors because you will run into an inconsistent state between the nanny and ipvs. However, a restart function for killed nannies would be a necessary feature. Regards -------------- next part -------------- A non-text attachment was scrubbed... Name: lvsd.patch Type: application/octet-stream Size: 999 bytes Desc: not available URL: -------------- next part -------------- J?rg -- Mivitec GmbH Systemadministration Phone +49 (89) 420 797 87 - 0 http://www.mivitec.de Mivitec GmbH [Gesch?ftsf?hrer: Alex Mirsky], eingetragen beim Amtsgericht Regensburg unter HRB8965, Sitz: Eichenstr. 24, 93161 Sinzing b. Regensburg. Ust.-Id.: DE 228543512 (Finanzamt Regensburg). Tel. 0941 - 4615854, Fax 0941 - 4612837, eMail: info at mivitec.de From yocum at fnal.gov Fri Jun 19 15:00:47 2009 From: yocum at fnal.gov (Dan Yocum) Date: Fri, 19 Jun 2009 10:00:47 -0500 Subject: lvsd kills off all nannies! In-Reply-To: References: Message-ID: <4A3BA81F.7060703@fnal.gov> Hm. I am disinclined to acquiesce to your request. If I have to patch code from an upstream vendor in order to get it to work, I might as well go with another solution like ldirectord. Don't get me wrong, I think the patch is necessary, but I also think that RH should be the one to distribute it in an official release of piranha. J?rg, can you please attach this path to the bugzilla report I opened? https://bugzilla.redhat.com/show_bug.cgi?id=505172 Thanks, Dan J?rg Kost wrote: > Hi Dan, > > i was hitting the same problem and looked inside. The situation is: When > any children of nanny die for any reason, the lvs-daemon is shutting > down completely. > > As a simple and fast workaround i changed the source code so lvsd > alerts the regular syslog process and does not kill the complete > services. See the attached patch file. Be sure to have a good monitoring > service for this kind of errors because you will run into an > inconsistent state between the nanny and ipvs. However, a restart > function for killed nannies would be a necessary feature. > > > Regards > > > > > J?rg > -- > Mivitec GmbH > Systemadministration > Phone +49 (89) 420 797 87 - 0 > http://www.mivitec.de > > Mivitec GmbH [Gesch?ftsf?hrer: Alex Mirsky], eingetragen beim > Amtsgericht Regensburg unter HRB8965, Sitz: Eichenstr. 24, 93161 Sinzing > b. Regensburg. Ust.-Id.: DE 228543512 (Finanzamt Regensburg). > Tel. 0941 - 4615854, Fax 0941 - 4612837, eMail: info at mivitec.de > -- Dan Yocum Fermilab 630.840.6509 yocum at fnal.gov, http://fermigrid.fnal.gov Fermilab. Just zeros and ones. From mgrac at redhat.com Wed Jun 24 16:09:43 2009 From: mgrac at redhat.com (=?ISO-8859-1?Q?Marek_=27marx=27_Gr=E1c?=) Date: Wed, 24 Jun 2009 18:09:43 +0200 Subject: lvsd kills off all nannies! In-Reply-To: <4A3BA81F.7060703@fnal.gov> References: <4A3BA81F.7060703@fnal.gov> Message-ID: <4A424FC7.5080403@redhat.com> Hi, Dan Yocum wrote: > Hm. I am disinclined to acquiesce to your request. > > If I have to patch code from an upstream vendor in order to get it to > work, I might as well go with another solution like ldirectord. You don't have to patch code by yourself. As a customer you should contact support which is responsible for such issues. You will obtain support for this patched version until next update will be released. If the patch is not just workaround then it will be available in next update/version. > Don't get me wrong, I think the patch is necessary, but I also think > that RH should be the one to distribute it in an official release of > piranha. > J?rg, can you please attach this path to the bugzilla report I opened? Jorg created an excellent workaround but as he told it is not an optimal solution. I don't have better solution now so there was not need to respond to your mail as I can't give you more information. When I find a way how to do it, then you receive updates on bugzilla. We are aware that communication towards community is really sub-optimal in piranha project but we started to work on it and first results should be available soon (public git tree, packaging of devel versions, ...) m,