[Libvir] RFC: safer memory allocation APIs with compile time checking

Mon Apr 28 19:16:02 UTC 2008

On Mon, Apr 28, 2008 at 06:45:54PM +0000, David Lutterkort wrote:
> 
> On Mon, 2008-04-28 at 13:38 +0100, Richard W.M. Jones wrote:
> > On Mon, Apr 28, 2008 at 03:39:41AM -0400, Daniel Veillard wrote:
> > >   Calling abort() in a library is a major NO-NO and one of the reasons
> > > I avoided glib in most of the code I developped. You just can't exit()/abort()
> > > from a library.
> > 
> > That depends ... If you can override the abort() function with an
> > error handler, then I'd say it is OK.
> 
> I used to not think very highly of calling abort() by default, but
> reading Havoc's blog post about that a while ago[1] is making me doubt
> conventional wisdom. He cites libxml2 as one example where OOM leads to
> crashes - I take that not as an indication that there is something wrong
> with libxml2, but with the approach of checking and correctly handling
> all allocation failures.

I agree with Havoc that it is not worth checking for OOM unless you
take the time to prove it is correctly handled. As mentioned earlier
in this thread one of the core problems making it impractical is
the API contract of malloc() which means you need manual code inspection
to verify you checked all mallocs(). The API contract I proposed for
virAlloc at least addresses that 1/2 of the problem by letting the
compiler tell us whether any allocations have missing checks. That
leaves the second part of the problem - the cleanup paths. We need 
to have the cleanup paths in the code regardless because arbitrary
syscalls (eg, write(), socket(), etc) we invoke may fail.  If we are
making sure those cleanup paths are correct anyway, then handling OOM
in this codepaths is minor incremental code & thus a much more tractable
problem.

I've just hacked up a similar approach to the one DBus uses to fail
the 'nth' malloc and run the 'xmconfigtest' test case for every 'n'
between 1 and 200 and OOM handling worked correctly in every case.
I also ran under valgrind and confirmed there were no leaks reported 
during any of the OOM conditions.

It'd be definitely be worthwhile trying to get this somehow hooked up
into all the test cases we have and run it automatically. I'm in 100%
agreement that without good test coverage you can't have a high degree
of confidence in the code reliability.

Dan.
-- 
|: Red Hat, Engineering, Boston   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|