OOM killer "Out of Memory: Killed process" SOLUTIONS / SUMMARY

Fri Aug 10 20:38:49 UTC 2007

2007/8/10, Eric Sisler <esisler at westminster.lib.co.us>:
> Since this problem seems to popup on different lists, this message has
> been cross-posted to the general Red Hat discussion list, the RHEL3
> (Taroon) list and the RHEL4 (Nahant) list.  My apologies for not having
> the time to post this summary sooner.
>
> I would still be banging my head against this problem were it not for
> the generous assistance of Tom Sightler <ttsig at tuxyturvy.com> and Brian
> Long <brilong at cisco.com>.
>
> In general, the out of memory killer (oom-killer) begins killing
> processes, even on servers with large amounts (6Gb+) of RAM.  In many
> cases people report plenty of "free" RAM and are perplexed as to why the
> oom-killer is whacking processes.  Indications that this has happened
> appear in /var/log/messages:
>   Out of Memory: Killed process [PID] [process name].

The fact of having large amounts of memory is important? I mean, this
can happen either with 2GB or 10GB?
It´s just curiosity, I have never ever faced this problem. I found
this topic really interesting, though

>
> In my case I was upgrading various VMware servers from RHEL3 / VMware
> GSX to RHEL4 / VMware Server.  One of the virtual machines on a server
> with 16Gb of RAM kept getting whacked by the oom-killer.  Needless to
> say, this was quite frustrating.
>
> As it turns out, the problem was low memory exhaustion.  Quoting Tom:
> "The kernel uses low memory to track allocations of all memory thus a
> system with 16GB of memory will use significantly more low memory than a
> system with 4GB, perhaps as much as 4 times.  This extra pressure
> happens from the moment you turn the system on before you do anything at
> all because the kernel structures have to be sized for the potential of
> tracking allocations in four times as much memory."
>
> You can check the status of low & high memory a couple of ways:
>
> # egrep 'High|Low' /proc/meminfo
> HighTotal:     5111780 kB
> HighFree:         1172 kB
> LowTotal:       795688 kB
> LowFree:         16788 kB
>
> # free -lm
>              total       used       free     shared    buffers     cached
> Mem:          5769       5751         17          0          8       5267
> Low:           777        760         16          0          0          0
> High:         4991       4990          1          0          0          0
> -/+ buffers/cache:        475       5293
> Swap:         4773          0       4773
>
> When low memory is exhausted, it doesn't matter how much high memory is
> available, the oom-killer will begin whacking processes to keep the
> server alive.
>
> There are a couple of solutions to this problem:
>
> If possible, upgrade to 64-bit Linux.  This is the best solution because
> *all* memory becomes low memory.  If you run out of low memory in this
> case, then you're *really* out of memory. ;-)
>
> If limited to 32-bit Linux, the best solution is to run the hugemem
> kernel.  This kernel splits low/high memory differently, and in most
> cases should provide enough low memory to map high memory.  In most
> cases this is an easy fix - simply install the hugemem kernel RPM &
> reboot.

Does hugemen act as a module or...? How can it expand the low memory?

>
> If running the 32-bit hugemem kernel isn't an option either, you can try
> setting /proc/sys/vm/lower_zone_protection to a value of 250 or more.
> This will cause the kernel to try to be more aggressive in defending the
> low zone from allocating memory that could potentially be allocated in
> the high memory zone.  As far as I know, this option isn't available
> until the 2.6.x kernel. Some experimentation to find the best setting
> for your environment will probably be necessary.  You can check & set
> this value on the fly via:
>   # cat /proc/sys/vm/lower_zone_protection
>   # echo "250" > /proc/sys/vm/lower_zone_protection
>
> To set this option on boot, add the following to /etc/sysctl.conf:
>   vm.lower_zone_protection = 250

If the first solution, your point was to upgrade to 64-bit. And as you
wrote, if you even run out of low memory...pray.
What if you do the vm.lower_zone_protection = 250 ? Should it give you
some more "extra time" before the disaster?

>
> As a last-ditch effort, you can disable the oom-killer.  This option can
> cause the server to hang, so use it with extreme caution (and at your
> own risk)!
> Check status of oom-killer:
>   # cat /proc/sys/vm/oom-kill
>
> Turn oom-killer off/on:
>   # echo "0" > /proc/sys/vm/oom-kill
>   # echo "1" > /proc/sys/vm/oom-kill
>
> To make this change take effect at boot time, add the following
> to /etc/sysctl.conf:
>   vm.oom-kill = 0
>
> For processes that would have been killed, but weren't because the oom-
> killer is disabled, you'll see the following message
> in /var/log/messages:
>   "Would have oom-killed but /proc/sys/vm/oom-kill is disabled"
>
> Sorry for being so long-winded.  I hope this helps others who have
> struggled with this problem.
>

Really interesting post, Eric.
Eventually, what did you do? Upgrade? Disable oom-killer? Pray? Delete
VMWare server? :-)

All the best.
Manuel