[Libguestfs] [PATCH nbdkit 2/2] common: Add checked-overflow macros and use for safe vector extension

Eric Blake eblake at redhat.com
Tue Nov 9 19:56:12 UTC 2021


On Tue, Nov 09, 2021 at 08:53:28PM +0200, Nir Soffer wrote:
> On Tue, Nov 9, 2021 at 7:49 PM Richard W.M. Jones <rjones at redhat.com> wrote:
> > +++ b/common/include/checked-overflow.h

> > +
> > +/* This header file defines functions for checking overflow in common
> > + * integer arithmetic operations.
> > + *
> > + * It uses GCC/Clang built-ins: a possible future enhancement is to
> > + * provide fallbacks in plain C or for other compilers.  The only
> > + * purpose of having a header file for this is to have a single place
> > + * where we would extend this in future.
> > + */

gnulib has such checks ported to other compilers (in intprops.h), but
with an incompatible license that we can't copy.

> > +
> > +#ifndef NBDKIT_CHECKED_OVERFLOW_H
> > +#define NBDKIT_CHECKED_OVERFLOW_H
> > +
> > +#if !defined(__GNUC__) && !defined(__clang__)
> > +#error "this file may need to be ported to your compiler"
> > +#endif
> > +
> > +/* Add two uint64_t values.  Returns true if overflow happened. */
> > +#define ADD_UINT64_T_OVERFLOW(a, b, r) __builtin_add_overflow((a), (b), (r))

I was able to figure out that this performs '*r = a + b' by guessing
from the macro parameter names; I'm not sure if adding that to the
comment would help other readers less familiar with the compiler
builtin.

> > +
> > +/* Multiply two uint64_t values.  Returns true if overflow happened. */
> > +#define MUL_UINT64_T_OVERFLOW(a, b, r) __builtin_mul_overflow((a), (b), (r))
> > +
> > +/* Add two size_t values.  Returns true if overflow happened. */
> > +#define ADD_SIZE_T_OVERFLOW(a, b, r) __builtin_add_overflow((a), (b), (r))
> > +
> > +/* Multiple two size_t values.  Returns true if overflow happened. */
> > +#define MUL_SIZE_T_OVERFLOW(a, b, r) __builtin_mul_overflow((a), (b), (r))

gcc also has forced-type builtins, but using
__builtin_uaddll_overflow() would be wrong (it's harder to prove
whether uint64_t is 'unsigned long' or 'unsigned long long'), so using
the generic macro is what we want anyways.  The fact that you have two
different macros (for UINT64_T and SIZE_T) that expand to the same
thing is a side-effect of gcc's implementation; if we later have to
code up a C language fallback for other compilers, that fallback may
indeed need different implementations between the two macros.  So I'm
fine with what looks like duplication here.

> > +++ b/common/utils/vector.c
> > @@ -36,6 +36,7 @@
> >  #include <stdlib.h>
> >  #include <errno.h>
> >
> > +#include "checked-overflow.h"
> >  #include "vector.h"
> >
> >  int
> > @@ -44,21 +45,30 @@ generic_vector_reserve (struct generic_vector *v, size_t n, size_t itemsize)
> >    void *newptr;
> >    size_t reqcap, reqbytes, newcap, newbytes;
> >
> > -  /* New capacity requested.  We must allocate this minimum (or fail). */
> > -  reqcap = v->cap + n;
> > -  reqbytes = reqcap * itemsize;
> > -  if (reqbytes < v->cap * itemsize) {
> > +  /* New capacity requested.  We must allocate this minimum (or fail).
> > +   * reqcap = v->cap + n
> > +   * reqbytes = reqcap * itemsize
> > +   */
> 
> This should explain the next lines? I'm not sure about it, it makes
> the code more complicated and the commented code can get out
> of sync with the actual code.
> 
> > +  if (ADD_SIZE_T_OVERFLOW (v->cap, n, &reqcap) ||
> > +      MUL_SIZE_T_OVERFLOW (reqcap, itemsize, &reqbytes)) {

Yes, the code in the comment is pseudocode for the macro use; I agree
that the duplication is a slight risk of losing sync, but it's a weak
argument (the comment is quite close to the code, and this file is not
likely to be frequently rewritten).  I'm fine keeping it.

> 
> Is order guaranteed?

Yes, because of ||.

> 
> I think it will be more clear as separate if blocks, even if we need
> to have 2 blocks for returning ENOMEM.

2 blocks is useful if you ever expect to be in a gdb session trying to
figure out which of the two conditions failed.  But for this one, I'm
fine with one.

> 
> >      errno = ENOMEM;
> > -    return -1; /* overflow */
> > +    return -1;
> >    }
> >
> >    /* However for the sake of optimization, scale buffer by 3/2 so that
> >     * repeated reservations don't call realloc often.
> > +   * newcap = v->cap + (v->cap + 1) / 2
> > +   * newbytes = newcap * itemsize
> >     */
> > -  newcap = v->cap + (v->cap + 1) / 2;
> > -  newbytes = newcap * itemsize;
> > -
> > +  if (ADD_SIZE_T_OVERFLOW (v->cap, 1, &newcap))
> > +    goto fallback;
> > +  newcap /= 2;
> > +  if (ADD_SIZE_T_OVERFLOW (v->cap, newcap, &newcap))
> > +    goto fallback;
> 
> This probably works but adding v->cap and newcap and storing
> back in newcap is pretty confusing. I would use a temporary:
> 
>     if (ADD_SIZE_T_OVERFLOW (v->cap, 1, &extracap))
>       goto fallback;
> 
>     extracap /= 2;
>     if (ADD_SIZE_T_OVERFLOW (v->cap, extracap, &newcap))
>       goto fallback;
> 
> > +  if (MUL_SIZE_T_OVERFLOW (newcap, itemsize, &newbytes))
> > +    goto fallback;
> >    if (newbytes < reqbytes) {
> > +  fallback:
> 
> Jumping inside an if block is evil.

Not the first time our code base has done it.  It's not always the
cleanest, but is more compact that a number of other alternatives
without too much hassle.

> 
> I would try to extract the code to compute the new capacity into a helper
> function:
> 
>    if (next_capacity(v-cap, n, itemsize, &newcap))
>        return -1;
> 
> This function can return early instead of jumping around or fail
> if we cannot reserve n items. In the worst case this function will
> only hide the overflow macros.

That is indeed an option which may improve legibility for the next
reader, even if it costs a few more lines of code now.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




More information about the Libguestfs mailing list