[olpc-software] graceful handling of out-of-memory conditions

Jim Gettys jg at laptop.org
Tue Mar 28 02:02:50 UTC 2006


I think a combination of both approaches are needed.  

We really do have *much* more knowledge at the desktop layer than the
operating system can have (unless we tell it).  Most of my desktop a the
moment consists of processes that are doing nothing (except consuming
memory), and which are represented by icons, could just as well be
killed and resurrected as needed.

And I agree that having a convention that applications are expected to
be able to checkpoint/restart is the other side of the coin.

One without the other will fail...

                        - Jim


On Tue, 2006-03-28 at 02:56 +0100, Daniel P. Berrange wrote:
> On Mon, Mar 27, 2006 at 08:03:51PM -0500, Jim Gettys wrote:
> > On Mon, 2006-03-27 at 19:30 -0500, David Zeuthen wrote:
> > > On Mon, 2006-03-27 at 11:19 -0500, Jim Gettys wrote:
> > > >  When I say "system components", I use a
> > > > wider net than just the Linux kernel, but include the X server, Window
> > > > manager, session manager, but not much else.  I'd like the base
> > > > environment to be rock solid.
> > > 
> > > Hmm, this sounds a but like the 90's; Clearly you're forgetting
> > > 
> > >  - D-BUS
> > >  - HAL
> > >  - NetworkManager
> > >  - Avahi
> > >  - CUPS
> > 
> > Yeah, you're right; except I'm an '80's dinosaur ;-).  Though CUPS can
> > probably survive being restarted pretty easily and I wouldn't put it in
> > that category.
> > 
> > > 
> > > just to name a few crucial "system" components of the Linux desktop de
> > > jour. I'm not sure why you think the Window or Session Manager are so
> > > important either, sure, we don't want to lose them, but, really X
> > > applications can survive when these temporarily goes away - it just
> > > doesn't look very pretty.
> > > 
> > 
> > Now it's your turn to be caught out on a limb and have someone saw it
> > off ;-).
> > 
> > The window manager is the item that knows what applications are most
> > likely to be used by the user, and will likely be key to decent OOM
> > behavior.  It knows what's on top, what's iconified, what is covered.
> > It is the process most likely to be telling the OS what processes have
> > to be killed in extremis.  Consider it an absolutely essential
> > component.
> 
> Sure the window manager knows what's on top, iconified, etc, but I'm far
> from convinced that this data can be used to do 'decent' OOM handling. 
> The principal barrier is that the info about state of an application's
> graphical windows tells the session manager *nothing* about the operation
> or architecture of the application. If the GUI is just a shim calling out
> to a DBus or Orbit service for all its work, then killing the GUI upon OOM
> is just fine, because the GUI can trivally restart & reconnect to the backend
> service where all the data is. In the modern desktop any non-trivial program
> makes significant use of IPC to any number of processes about which the
> session manager has no information. 
> 
> While you may be able to whitelist some subset of IPC related system and user
> daemons, there'd be enough not whitelisted that incorrect decisions would be
> made. Then what if one of the whitelisted daemons *was* the program process
> consuming all memory. Alternatively what if it is the currently focused app
> which is the problem.
> 
> So I think while you could write an OOM handler based on the info available
> to the window/session manager, I rather doubt it would be any better at 
> picking which process to kill off under OOM situations than the kernel is. 
> Basically OOM handling is a fundamentally hard problem, and its inevitable
> that no matter what algorithm you choose for picking processes, you'll eventually
> choose one that is 'important' to the user. 
> 
> So while we could put lots of research into figuring out an optimal OOM handling
> solution, I think we'd be better off picking a simple algorithm, and then focusing
> effort on modifying applications  such that in the event they are killed off, no 
> user data is lost. Such modifications would be useful beyond post OOM handling, eg
> post a SEGV crash a user wouldn't loose data. Or it would enable a window manager
> to proactively shutdown apps before an OOM situation is even encountered.
> 
> Basically we have to recognise that we have limited resources & need to choose
> the approach that gives the biggest benefit from the user's POV. Perfecting
> the specific case of OOM handling IMHO has far less benefit that perfecting
> session recovery. Its kinda like the difference between vertical & horizontal
> server scalability - you could engineer a single server to deal with absolutely
> every failure eventuality, but it'll still fail, or you can make a set of 'n'
> servers 'good enough' & ensure that when failure does occur, recovery is trivial.
> 
> Regards,
> Dan.
> -- 
> |=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
> |=-           Perl modules: http://search.cpan.org/~danberr/              -=|
> |=-               Projects: http://freshmeat.net/~danielpb/               -=|
> |=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 
> 
> --
> olpc-software mailing list
> olpc-software at redhat.com
> https://www.redhat.com/mailman/listinfo/olpc-software
-- 
Jim Gettys
One Laptop Per Child





More information about the olpc-software mailing list