[Linux-cluster] dealing with oom-killer....

Christine Caulfield ccaulfie at redhat.com
Wed Sep 2 10:47:39 UTC 2009


On 02/09/09 11:33, Corey Kovacs wrote:
> A colleague has a 5 node cluster with 4GB ram in each node. It's not
> enough for the cluster and more ram is on the way. The problem though is
> that until the ram arrives, there is risk of oom-killer (which he found
> out the other day) firing up and putting the node into a state which
> made it utterly useless but still looked good to the cluster. We could
> of course disable oom-killer but that's a workaround, not a fix.
>
> I am wondering if the cluster responding to oom-killer firing up and
> fencing the offending node is possible and if so, how others might have
> done it. Seems like it should just be handled by the cluster tho. Maybe
> have cman put a message across the openais "bus" like, "Hey, losing my
> brain here, someone whak me"...
>

I suppose you could give cman a large value for /proc/<pid>/oom_score so 
that it is the first thing to be killed if the system runs out of 
memory. That should guarantee that it will be fenced by the other nodes 
... provided they have enough memory to remain quorate!

Chrissie




More information about the Linux-cluster mailing list