[Libvir] RFC: safer memory allocation APIs with compile time checking
Daniel Veillard
veillard at redhat.com
Mon Apr 28 07:39:41 UTC 2008
On Sun, Apr 27, 2008 at 08:29:33PM +0100, Daniel P. Berrange wrote:
> After updating the virBuffer APIs to protect against improper usage I have
> been thinking about how we might provider safer memory allocation APIs
> with protection against common usage errors and compile time validation of
> checks for failure.
>
> The standard C library malloc/realloc/calloc/free APIs have just about the
> worst possible design, positively encouraging serious application coding
> errors. There are at least 7 common problems I can think of at the moment,
> perhaps more....
>
> 1. malloc() - pass incorrect number of bytes
> 2. malloc() - fail to check the return value
> 3. malloc() - use uninitialized memory
> 4. free() - double free by not setting pointer to NULL
> 5. realloc() - fail to check the return value
> 6. realloc() - fail to re-assign the pointer with new address
> 7. realloc() - accidental overwrite of original pointer with NULL on
> failure, leaking memory
>
> Many applications / libraries wrap these in their own functions, but typically
> fail to address one or more these problems.
>
> eg glib re-definitions try to address point 2 by having g_malloc() call
> abort() on failure, but then re-introduce the problem by adding g_try_malloc()
Calling abort() in a library is a major NO-NO and one of the reasons
I avoided glib in most of the code I developped. You just can't exit()/abort()
from a library.
> All 7 of the errors listed early could be avoided and/or checked at compile
> time if the APIs used a different calling convention.
>
> 1. For any given pointer the compiler knows how many bytes need to be
> allocated for a single instance of it. ie sizeof(*(ptr)). Since C
> has no data type representing a data type, this cannot be done with
> a simple function - it needs a (trivial) macro.
>
> 2. The __attribute__((__warn_unused_result__)) annotation causes the
> compiler to warn if an application doesn't do something with a return
> value. With the standard malloc() API this doesn't help because you
> always assign the return value to a pointer. The core problem here
> is that the API encodes the error condition as a special case of
> the actual data returned. This can be addressed by separating the
> two. The return value should simply be bi-state success or fail code
> and the return data passed back via an pointer to a pointer parameter.
>
> 3. The memory allocated by malloc() is not initialized to zero. For
> that a separate calloc() function is provided. This leaves open the
> possiblity of an app mistakenly using the wrong variant. Or consider
> an app using malloc() and explicitly initializing all struct members.
> Some time later a new member is added and now the developer needs to
> find all malloc() calls wrt to that struct and explicitly initilize
> the new member. It would be safer to always have malloc zero all
> memory, even though it has a small performance impact, the net win
> in safety is probably worth it.
>
> 4. Since free() takes the pointer to be freed directly it cannot reset
> the caller's original pointer back to NULL. This can be addressed if
> a pointer to the pointer is passed instead. The computational cost
> of the often redundant assignment to NULL is negligable given the
> safety benefit
>
> 5, 6 & 7. As with problem 2, these problems are all caused by the fact
> that the error condition is special case of the return data value.
> The impact of these is typically far worse because it often results
> in a buffer overflow and the complex rules for calling realloc() are
> hard to remember.
>
> So the core requirements of a safer API for allocation are:
>
> - Use return values *only* for the success/fail error condition
> - Annotate all the return values with __warn_unused_result__
> - Pass a pointer to the pointer into all functions
> - Define macros around all functions to automatically do the sizeof(*(ptr))
>
> Passing a pointer to a pointer gets a little tedious because of the need
> to write '&(foo)' instead of just 'foo', and is actually introduces a
> new problem of the caller accidentally passing merely a pointer, instead
> of a pointer to a pointer. This is a minor problem though because it will
> be identified on the very first run of the code. In addition, since we're
> defining macros around every function we can have the macro also convert
> from ptr to &(ptr) for us, avoiding this potential error:
>
> So the primary application usage would be via a set of macros:
>
> VIR_ALLOC(ptr)
> VIR_ALLOC_N(ptr, count)
> VIR_REALLOC(ptr)
you will need some size here.
> VIR_REALLOC_N(ptr, count)
> VIR_FREE(ptr)
>
> These call through to the underlying APIs:
>
> int virAlloc(void *, size_t bytes)
> int virAllocN(void *, size_t bytes, size_t count)
> int virRealloc(void *, size_t bytes)
> int virReallocN(void *, size_t bytes, size_t count)
> int virFree(void *)
>
> Note that although we're passing a pointer to a pointer into all these, the
> first param is still just 'void *' and not 'void **'. This is because 'void *'
> is defined to hold any type of pointer, and using 'void **' causes gcc to
> complain bitterly about strict aliasing violations. Internally the impls of
> these methods will safely cast to 'void **' when deferencing the pointer.
One of the problem this introduce is what uses to be a common mistake
freeing the a non-pointer object which is normally immediately pointed out
by the compiler whicll be lost because your macro will turn this into
a void * to the data, and you will get a runtime problem instead.
> All 4 of Alloc/Realloc functions will have __warn_unused_result__ annotation
> so the caller is forced to check the return value for failure, validated at
> compile time generating a warning (or fatal compile error with -Werror).
>
> So to wire up the macros to the APIs:
>
> #define VIR_ALLOC(ptr) virAlloc(&(ptr), sizeof(*(ptr)))
> #define VIR_ALLOC_N(ptr, count) virAllocN(&(ptr), sizeof(*(ptr)), (count))
> #define VIR_REALLOC(ptr) virRealloc(&(ptr), sizeof(*(ptr)))
That i really don't understand. How do you expect to use that realloc
without passing a new size.
> #define VIR_REALLOC_N(ptr, count) virReallocN(&(ptr), sizeof(*(ptr)), (count))
That I can understand , but the previous one i can't.
> #define VIR_FREE(ptr) virFree(&(ptr))
[...]
> Much less ugly:
>
> if (VIR_ALLOC_N(guest->arch.defaultInfo.machines, nmachines) < 0)
> return -1;
>
> if (VIR_REALLOC(migrateTrans, caps->host.nmigrateTrans+1) < 0)
> return -1;
how does sizeof(*(caps->host.nmigrateTrans+1)) increases the size ?
Doesn't make sense to me, you take a pointer, increment it, so basically just
pointing to the next element in the array, but the size of the pointed object
would be identical and realloc() becomes a noop.
The proposal may help clean a lot of things, but VIR_REALLOC I don't
understand, what did i missed ?
> and produced easier to read code too. The cleaner realloc() API is particularly
> appealing.
Well ... i don't understand it !
> This does all seem a little bit too good to be true - I'm wondering why other
> apps/libraries don't use this style of allocation functions already ? Perhaps
> someone can find nasty flaws in this approach, but hopefuly not...
I would be tempted to hook into the error reporting directly within
memery.c , even if we don't have a good context there, basically if we
have a memory shortage even reporting to stderr is sensible , at least once.
> +int virAlloc(void *ptrptr, size_t size)
> +{
> + if (size == 0) {
> + *(void **)ptrptr = NULL;
> + return 0;
> + }
> +
> + *(void **)ptrptr = calloc(1, size);
> + if (*(void **)ptrptr == NULL)
> + return -1;
> + return 0;
> +}
[...]
> Index: hash.c
> ===================================================================
> RCS file: /data/cvs/libvirt/src/hash.c,v
> retrieving revision 1.37
> diff -u -p -r1.37 hash.c
> --- hash.c 18 Apr 2008 08:33:23 -0000 1.37
> +++ hash.c 27 Apr 2008 19:04:28 -0000
The hash example is interesting, but it doesn't use REALLOC ...
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard at redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
More information about the libvir-list
mailing list