kernel: Out of Memory: Killed process

Jonathan Billings jsbillin at Princeton.EDU
Tue Jun 19 13:20:03 UTC 2007


On Tue, Jun 19, 2007 at 05:36:01AM -0700, Shikha Alex wrote:
> Hi,
> 
> One of the servers in our cluster dies randomly,
> sometimes after few hours and sometimes after 2-3 days
> and has following line in "/var/log/messages": 
> 
> kernel: Out of Memory: Killed process 7105
> (00-logwatch)
> 
> Application LOG file:
> --- mpimon --- Aborting run after sumonitor-5
> terminated abnormally ---
> 
> The process running on these servers use upto 2GB RAM.
> Total RAM on the servers is 4GB and swap of 6GB. OS is
> RHEL4 Update4.
> 
> Please advice on any fixes for this problem. The
> process we are trying to run has to be restarted each
> time this error occurs.

It looks like your system was getting low on memory, probably because
of the cron job 00-logwatch or something that started around the same
time as 00-logwatch.  The oom-killer will kill off processes when
memory is full, depending on a couple parameters.  

For computational clusters, I suggest setting the
/proc/sys/vm/overcommit_memory to 1 and adjust the
/proc/sys/vm/overcommit_ratio higher, to allow your nodes to allocate
more memory before the oom-killer hits.  Also, I suggest you turn off
cron jobs like logwatch and slocate since they're of little use on
individual HPC nodes.  If you want to monitor the syslogs on them, I
suggest pointing them all at a central syslog server instead.

It helps to understand *when* the oom-killer is going to come in and
kill off processes, so I suggest reading about it.  I seem to recall
the kernel chooses processes based on their memory size, 'niceness',
whether it does I/O and how recently it was started.

There's a nice article in the Red Hat Magazine about the VM, which
describes vm.overcommit_memory and vm.overcommit_ratio:
http://www.redhat.com/magazine/001nov04/features/vm/

You can also read the kernel documentation in the linux source, in
.../Documentation/vm/overcommit-accounting.

-- 
Jonathan Billings <jsbillin at princeton.edu>
Computational Science and Engineering Support (CSES)
http://www.princeton.edu/~cses/




More information about the redhat-sysadmin-list mailing list