[olpc-software] graceful handling of out-of-memory conditions

David Zeuthen davidz at redhat.com
Tue Mar 28 13:59:20 UTC 2006


Hi,

On Tue, 2006-03-28 at 15:46 +0800, Jaya Kumar wrote:
> - oom killer selects the pid to murder by picking the pid that has the
> highest mem size ( mm->total_mm + (all of it's children's mem size/2))
> - long running + if_niced. this sounds nice and intuitive to me.
> - the core oom killer function is badness() here
> http://sosdg.org/~coywolf/lxr/source/mm/oom_kill.c#L46
> 
> I think the kernel already gives you the capability you request, right?

Sounds good to me; so

 - init(1) set /proc/pid/oom_adj to OOM_DISABLE

 - All child processes inherit this; HAL, NM, the X server, the panel,
   the WM etc.

 - Our desktop shell sets /proc/pic/oom_adj to something != OOM_DISABLE
   for child processes launched (e.g. Firefox, Abiword etc.) 

 - For activation via IPC, teach activators about tweaking
   /proc/pid/oom_adj too. AFAIK this currently includes the D-BUS
   session bus and ORBit2 though we want to get rid of the latter. This
   is probably a small patch we can get upstream

So on OOM only applications launched by the user will be killed. Of
course this only happens when OOM. This can happen in the following
scenarios

 1. Our desktop shell is leaking; we fix this via QA and testing;
    btw, said shell will grow and shrink a bit over time; for example
    if you attach e.g. a USB storage device, hal will launch a new
    hald-addon-storage process (it's tiny though) to poll for media
    changes...

 2. User is running only one "application" (e.g. Firefox) and this
    app needs fixing; well, in this case the app needs fixing 
    _anyway_ - we are just guaranteed the app is nuked instead of
    some vital system process

 3. User is running multiple "apps"; maybe we shouldn't allow it but
    this is probably too restrictive.. Or allow it with the caveat
    that it may be killed - tough to tell the users (kids) about this
    in an easy way though. Consider this totally scary dialog

        +----------------------------------------------------+
        | You are running more than one (legacy) application |
        | already. Your applications may exit unexpectedly   |
        | and you may lose all your work if running more     |
        | than one (legacy) application.                     |
        |                                                    |
        | [Go Ahead Already]              [Don't Launch App] |
        +----------------------------------------------------+

    It may sound silly but I think it's what we need to do. Of course
    we should be able to tag applications in their .desktop file
    with "OLPC_OOM_Handling_and_Cooperation_with_WM=True" if apps save
    state on OOM and knows how to cooperate with the WM. Then the WM
    and/or panel simply iconifies the app on exit

    Then we don't show said silly dialog if all apps launched have
    this key set to True. Btw, in the dialog, "legacy application"
    refers to one that don't have the key

      OLPC_OOM_Handling_and_Cooperation_with_WM=True 

    in the .desktop file. 

    Of course, wording etc. needs to be fixed, but this is what
    I think we should do.

What do you think?

Once we know how lean our "desktop shell" is, we know exactly how much
memory is left for apps before they are OOM; to test this is easy, we
can just write small app that allocates a bunch of memory. Then we can
also verify that the kernel indeed nukes the right process, e.g. the
app.

Cheers,
David





More information about the olpc-software mailing list