[Libguestfs] [PATCH nbdkit] server: public: Add nbdkit_parse_* functions for safely parsing integers.

Mon Sep 23 18:23:28 UTC 2019

On 9/23/19 12:52 PM, Richard W.M. Jones wrote:
> On Mon, Sep 23, 2019 at 12:05:11PM -0500, Eric Blake wrote:
>>> + int nbdkit_parse_long (const char *what, const char *str, long *r);
>>> + int nbdkit_parse_unsigned_long (const char *what,
>>> +                                 const char *str, unsigned long *r);
>>
>> Do we really want to encourage the use of parse_long and
>> parse_unsigned_long?  Those differ between 32- and 64-bit platforms, at
>> which point, it's often better to explicitly call out the intended size.
> 
> So I'll say why I added these.  It may not be a good reason, but here
> goes :-)
> 
> There are two external APIs we use which need long types, libcurl
> (curl_easy_setopt)

Eww - that one takes a vararg, but integers are indeed passed as 'long'
and there may be ABIs where passing 'int' causes wrong behavior.

Still, we could parse into an 'int' or 'unsigned', then assign that to
yet another variable of type 'long', before passing the long to cur..

> and libssh (ssh_options_set).

That one takes a void* which will be interpreted as 'long*', so again,
we have to have the correct type in place.  But again, we could parse
into 'int' or 'unsigned', then copy to a 'long' prior to calling the API.

>  The existing
> parameters had type long (I don't think unsigned long is actually
> used) which we passed directly to one of these APIs.  To use another
> type would involve a slightly awkward check + cast, with the danger
> that the code would break on the rarely tested 32 bit platform.

(Potentially) widening casts aren't a problem, although I do share your
concern that 32-bit platforms are not tested as frequently.

>>> +++ b/include/nbdkit-common.h
>>> @@ -84,6 +84,28 @@ extern void nbdkit_vdebug (const char *msg, va_list args);
>>>  extern char *nbdkit_absolute_path (const char *path);
>>>  extern int64_t nbdkit_parse_size (const char *str);
>>>  extern int nbdkit_parse_bool (const char *str);
>>> +extern int nbdkit_parse_int (const char *what, const char *str,
>>> +                             int *r);
>>
>> Should we mark 'what' and 'str' as being non-null parameters?  But you
>> definitely document 'r' as valid when NULL.
> 
> I think we decided to avoid using __attribute__((nonnull)) on the
> external API, because it was ... unsafe?  I don't recall now if we
> avoided it deliberately or not, but we definitely don't use it except
> on internal functions!
> 
> Yes it would make sense as long as it's not unsafe.

Yeah, now I remember.

The mere act of marking something __attribute__((nonnull)) can make
(older?) gcc miscompile code into completely eliminating NULL checks.
But since the attribute in the public header is advisory only (and not
compulsory), we probably WANT explicit NULL checks in our function
bodies (unless we think SEGV'ing a poorly-written plugin is reasonable).
 It may turn into something where when compiling the REST of the code
base, we want the attributes, but when compiling server/public.c, we
intentionally avoid the attributes.  Separate patch, if at all.

>>> +
>>>  /* Parse a string as a size with possible scaling suffix, or return -1
>>>   * after reporting the error.
>>>   */
>>
>> Hmm, seeing this comment, do we really need raw [s]size_t parsing, or
>> can we get away with scaled parsing anywhere that sizes are intended?
>> But the code looks correct if you want to keep it.
> 
> nbdkit_parse_ssize_t is not used.  nbdkit_parse_size_t is used for a
> single case, ‘xz-max-depth’ which is genuinely an array length, so
> size_t is (albeit marginally) useful here.

It's also an array length where a misconfigured input can cause the
filter to be non-functional; see commit fd2deeb1 where I abused it to
intentionally cause calloc() failure to prove what happens when a filter
.open() fails.  Do we really need that much flexibility, or can we get
away with stating that the max depth has to fit within 32 bits, even if
it does populate an array length later?  (Here, I'm not as sure of the
answer, as I didn't actually research where it is used, only that I
could overflow the calloc).

> 
> It's not worth dying on this particular hill, but I think there is
> some use for all of these functions.

True, and so I'm not going to object if we expose it after all; it's
just harder to remove something we don't like than to add something we
found to be missing.

>>> +#define PARSE(TYPE, FORMAT, TEST, RET, EXPECTED)                        \
>>> +  do {                                                                  \
>>> +    error_flagged = false;                                              \
>>> +    TYPE i = 0;                                                         \
>>> +    int r = nbdkit_parse_##TYPE ("test", TEST, &i);                     \
>>> +    if (r != RET || i != EXPECTED) {                                    \
>>> +      fprintf (stderr,                                                  \
>>> +               "%s: %d: wrong parse for %s: r=%d i=" FORMAT "\n",       \
>>> +               __FILE__, __LINE__, TEST, r, i);                         \
>>> +      pass = false;                                                     \
>>> +    }                                                                   \
>>> +    if ((r == -1) != error_flagged) {                                   \
>>> +      fprintf (stderr,                                                  \
>>> +               "%s: %d: wrong error message handling for %s\n",         \
>>> +               __FILE__, __LINE__, TEST);                               \
>>> +      pass = false;                                                     \
>>> +    }                                                                   \
>>> +  } while (0)
>>> +#define OK 0
>>> +#define BAD -1
>>
>> Nice.
> 
> But I discovered you cannot do:
> 
>   #define BAD -1, 0
> 
> which is a shame :-(

Easy fix:

#define PARSE(...) PARSE(__VA_ARGS__)
#define PARSE_(TYPE, FORMAT, TEST, RET, EXPECTED)

now you can #define BAD -1, 0, and the expansion of PARSE() causes the
expansion of BAD prior to the expansion of PARSE_().

>>> +++ b/server/usergroup.c
>>> @@ -97,7 +97,7 @@ parseuser (const char *id)
>>>  
>>>      saved_errno = errno;
>>>  
>>> -    if (sscanf (id, "%d", &val) == 1)
>>> +    if (nbdkit_parse_int ("parseuser", id, &val) == 0)
>>
>> Is 'int' still the right type for this?
> 
> Hopefully it will warn us if uid_t stops being int.  (Also
> we're assuming pid_t == int).

64-bit mingw has had a long-standing bug that pid_t was declared as
64-bit, even though getpid() returns a 32-bit 'int'; I wish they'd fix
their headers, but it violates the assumption of pid_t == int.

Also, POSIX is considering standardizing fcntl(F_SETOWN_EX) in order to
allow pid_t > int: http://austingroupbugs.net/view.php?id=1274

So while your assumption may hold for now on Linux and BSD, I wouldn't
hold my breath on it lasting forever.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20190923/e9ca1bee/attachment.sig>