Help: Runaway processes killing server...

Fri Sep 3 16:17:14 UTC 2004

I think you could use MON instead of nagios as a better monitoring and alarming system based on perl scripts.....
----- Original Message -----
From: Mauri Sahlberg <Mauri.Sahlberg at claymountain.com>
Date: Thu, 02 Sep 2004 11:16:32 +0300
To: For users of Fedora Core releases <fedora-list at redhat.com>
Subject: Re: Help: Runaway processes killing server...

> A lot of speculation follows, probably not very light reading and might
> be complete gooblegeep.
> 
> Tommy Reynolds wrote:
> | Try adding more swap space.  Check the web for how to use an ordinary
> | file for this if you don't have any free disk space.  Something like:
> |
> 
> Adding swap might help and I certainly hope it will. Under normal load
> the old swap, which was a half gigabyte in size, was in practice unused.
> Now the total amount of swap is four and half gigabytes which should be
> a lot more than is required.
> 
> Somehow this combination of events and programs caused a very rapid
> consumption of both cpu and memory which resulted state that was
> unrecoverable despite of OOM.  As OOM killed http processes the load
> coming in from them should have vanished and the memory should have been
> freed. This did not happen and according to apache logs, if it was able
> to update it's logs, the external pressure had also vanished, that is,
> the spammer had stopped loading pages when they became unresponsive.
> 
> The httpd-process seems to peak it's usage of cpu and memory upon
> startup, so the OOM probably kept killing "same" innocent http process
> over and over again. Meanwhile nothing else got cpu but the http-process
> that spawned the new ones and the OOM that killed the httpds that were
> spawned.
> 
> Probably a better work around for the problem would be limiting resource
> usage of the apache user and the postgres user as Alexander Dalloz
> proposes. I'll probably try this if the increased swap does not help.
> 
> Thanks to both of you.
> 
> I could also work around this problem by implementing a script that
> monitors the resource usage of both postgres and apache users and shuts
> ~ the services down for a while when preset limit is exceeded or better
> yet, use nagios to do this.
> 
> What I am actually looking for is clues how to find out what causes the
> rapid consumption of the resources, where, by whom and how fast this
> actually happens. I'm looking for tools to do better post mortem
> diagnosis or tools that would gather me better information for post
> mortem diagnosis. The /var/log/messages with OOM lines did not help me a
> bit.
> 
> I have a hunch that either php, httpd or postgres has a bug in it that
> will cause it to consume everything that it can get when certain
> conditions are met. There are switches that can be turned in all of the
> three programs that might help but to identify which switches and in
> which program I need more info.
> 
> Regards,
> 
> 
> -- 
> fedora-list mailing list
> fedora-list at redhat.com
> To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-list

-- 
______________________________________________
Check out the latest SMS services @ http://www.linuxmail.org 
This allows you to send and receive SMS through your mailbox.


Powered by Outblaze