[Libguestfs] [PATCH nbdkit] server: public: Add nbdkit_parse_* functions for safely parsing integers.

Mon Sep 23 17:52:26 UTC 2019

On Mon, Sep 23, 2019 at 12:05:11PM -0500, Eric Blake wrote:
> > + int nbdkit_parse_long (const char *what, const char *str, long *r);
> > + int nbdkit_parse_unsigned_long (const char *what,
> > +                                 const char *str, unsigned long *r);
> 
> Do we really want to encourage the use of parse_long and
> parse_unsigned_long?  Those differ between 32- and 64-bit platforms, at
> which point, it's often better to explicitly call out the intended size.

So I'll say why I added these.  It may not be a good reason, but here
goes :-)

There are two external APIs we use which need long types, libcurl
(curl_easy_setopt) and libssh (ssh_options_set).  The existing
parameters had type long (I don't think unsigned long is actually
used) which we passed directly to one of these APIs.  To use another
type would involve a slightly awkward check + cast, with the danger
that the code would break on the rarely tested 32 bit platform.

...
> > + int nbdkit_parse_int16_t (const char *what,
> > +                           const char *str, int16_t *r);
> > + int nbdkit_parse_uint16_t (const char *what,
> > +                            const char *str, uint16_t *r);
> 
> I guess we can add [u]int8_t variants later if a need arises?  Or should
> we do it now?

We certainly could.  I couldn't see an immediate reason, except maybe
for the ’mbr-id’ parameter.

> > + int nbdkit_parse_int32_t (const char *what,
> > +                           const char *str, int32_t *r);
> > + int nbdkit_parse_uint32_t (const char *what,
> > +                            const char *str, uint32_t *r);
> > + int nbdkit_parse_int64_t (const char *what,
> > +                           const char *str, int64_t *r);
> > + int nbdkit_parse_uint64_t (const char *what,
> > +                            const char *str, uint64_t *r);
> > + int nbdkit_parse_ssize_t (const char *what,
> > +                           const char *str, ssize_t *r);
> > + int nbdkit_parse_size_t (const char *what,
> > +                          const char *str, size_t *r);
> 
> [s]size_t is another one that can differ between 32- and 64-bit
> platforms; but I'm fine with having these function, especially since
> there are still efforts being considered about an alternative kernel ABI
> with 64-bit integers but 32-bit size_t.

Yes I think if we had to drop long or size_t, I'd rather keep the
size_t ones.  Those are actually meaningful -- eg. if used for sizing
an internal array.

> > +Parse string C<str> into an integer of various types.  These functions
> > +parse a decimal, hexadecimal (C<"0x...">) or octal (C<"0...">) number.
> > +
> > +On success the functions return C<0> and set C<*r> to the parsed value
> > +(unless C<*r == NULL> in which case the result is discarded).  On
> > +error, C<nbdkit_error> is called and the functions return C<-1>.
> > +
> > +The C<what> parameter is printed in error messages to provide context.
> > +It should usually be a short descriptive string of what you are trying
> > +to parse, eg:
> > +
> > + if (nbdkit_parse_int ("random seed", argv[1], &seed) == -1)
> > +   return -1;
> > +
> > +might print an error:
> > +
> > + random seed: could not parse number: "lalala"
> > +
> 
> Do we want to make it part of our documented contract that *r is
> unchanged if we return -1?

Yes I think we can document this.  It's useful to have that behaviour
defined because it makes error recovery simpler.  I will add it to my
copy.

> > @@ -129,22 +129,20 @@ cache_config (nbdkit_next_config *next, void *nxdata,
> >      return 0;
> >    }
> >    else if (strcmp (key, "cache-high-threshold") == 0) {
> > -    if (sscanf (value, "%d", &hi_thresh) != 1) {
> > -      nbdkit_error ("invalid cache-high-threshold parameter: %s", value);
> > +    if (nbdkit_parse_unsigned ("cache-high-threshold",
> > +                               value, &hi_thresh) == -1)
> 
> Semantic change; previously "010" was '10' and "08" was '8', now "010"
> is '8' and "08" is a failure (because we used "%d" instead of "%i").
> (Multiple spots in this patch).  Not a big deal to me, but we may want
> to call it out in the commit message as intentional.

I guess %i (not %d) is the one which parses octal and hex.  I think
this change is for the better however since it improves consistency,
so I'll mention it in the commit message.

> > +++ b/include/nbdkit-common.h
> > @@ -84,6 +84,28 @@ extern void nbdkit_vdebug (const char *msg, va_list args);
> >  extern char *nbdkit_absolute_path (const char *path);
> >  extern int64_t nbdkit_parse_size (const char *str);
> >  extern int nbdkit_parse_bool (const char *str);
> > +extern int nbdkit_parse_int (const char *what, const char *str,
> > +                             int *r);
> 
> Should we mark 'what' and 'str' as being non-null parameters?  But you
> definitely document 'r' as valid when NULL.

I think we decided to avoid using __attribute__((nonnull)) on the
external API, because it was ... unsafe?  I don't recall now if we
avoided it deliberately or not, but we definitely don't use it except
on internal functions!

Yes it would make sense as long as it's not unsafe.

> > +++ b/plugins/curl/curl.c
> > @@ -204,7 +204,9 @@ curl_config (const char *key, const char *value)
> >    }
> >  
> >    else if (strcmp (key, "timeout") == 0) {
> > -    if (sscanf (value, "%ld", &timeout) != 1 || timeout < 0) {
> > +    if (nbdkit_parse_long ("timeout", value, &timeout) == -1)
> > +      return -1;
> > +    if (timeout < 0) {
> >        nbdkit_error ("'timeout' must be 0 or a positive timeout in seconds");
> >        return -1;
> 
> Why not change to 'unsigned timeout'?  'long' is rather oversized on
> 64-bit platforms, and keeping things signed doesn't buy much compared to
> just going unsigned.  (Yep, that's me trying to get rid of a need for
> parse_long).

See above about curl_easy_setopt.

> > +++ b/plugins/partitioning/partitioning.c
> > @@ -211,10 +211,8 @@ partitioning_config (const char *key, const char *value)
> >    else if (strcmp (key, "mbr-id") == 0) {
> >      if (strcasecmp (value, "default") == 0)
> >        mbr_id = DEFAULT_MBR_ID;
> > -    else if (sscanf (value, "%i", &mbr_id) != 1) {
> > -      nbdkit_error ("could not parse mbr-id: %s", value);
> > +    else if (nbdkit_parse_int ("mbr-id", value, &mbr_id) == -1)
> >        return -1;
> 
> Is 'int' the best type for this?

Probably not, see above.

> > +++ b/plugins/ssh/ssh.c
> > @@ -59,7 +59,7 @@ static bool verify_remote_host = true;
> >  static const char *known_hosts = NULL;
> >  static const char **identity = NULL;
> >  static size_t nr_identities = 0;
> > -static long timeout = 0;
> > +static unsigned long timeout = 0;
> 
> Again, a 32-bit timeout regardless of 32- or 64-bit platform is probably
> saner.

See above about ssh_options_set.

> > +++ b/plugins/vddk/vddk.c
> > @@ -271,10 +271,8 @@ vddk_config (const char *key, const char *value)
> >        return -1;
> >    }
> >    else if (strcmp (key, "nfchostport") == 0) {
> > -    if (sscanf (value, "%d", &nfc_host_port) != 1) {
> > -      nbdkit_error ("cannot parse nfchostport: %s", value);
> > +    if (nbdkit_parse_int ("nfchostport", value, &nfc_host_port) == -1)
> >        return -1;
> > -    }
> 
> Is 'int' the right type here?

Actually this is a bug twice over.  The VDDK struct wants uint32_t for
both of the port fields.  Obviously since they are port numbers
something like uint16_t would be more suitable.  (0 has the special
meaning of "default port").

I think I'll change this to uint16_t because the implicit cast up to
uint32_t when we assign it to the VDDK struct shouldn't be a problem,
and it's useful to have bounds checking.

> > +/* Functions for parsing signed integers. */
> > +int
> > +nbdkit_parse_int (const char *what, const char *str, int *rp)
> > +{
> > +  long r;
> > +  char *end;
> > +
> > +  errno = 0;
> > +  r = strtol (str, &end, 0);
> > +#if INT_MAX != LONG_MAX
> > +  if (r < INT_MIN || r > INT_MAX)
> > +    errno = ERANGE;
> > +#endif
> > +  PARSE_COMMON_TAIL;
> > +}
> 
> Looks correct.

I actually compiled this and ran the tests with -m32 and -m64.  It
took several rounds to get it right.

> > +#endif
> > +  if (r < SSIZE_MIN || r > SSIZE_MAX)
> > +    errno = ERANGE;
> > +#endif
> > +  PARSE_COMMON_TAIL;
> > +}
> > +
> > +/* Functions for parsing unsigned integers. */
> > +
> > +/* strtou* functions have surprising behaviour if the first character
> > + * (after whitespace) is '-', so deny this.
> > + */
> > +#define PARSE_ERROR_IF_NEGATIVE                                         \
> > +  do {                                                                  \
> > +    const char *p = str;                                                \
> > +    while (isspace (*p))                                                \
> > +      p++;                                                              \
> > +    if (*p == '-') {                                                    \
> > +      nbdkit_error ("%s: negative numbers are not allowed", what);      \
> > +      return -1;                                                        \
> > +    }                                                                   \
> > +  } while (0)
> > +
> 
> Agree.  However, why not modify str in-place, so that the subsequent
> strtoul() does not have to repeat our prefix scan?

Yes, that's a good idea, I didn't think it through.

> > +  PARSE_COMMON_TAIL;
> > +}
> > +
> > +int
> > +nbdkit_parse_size_t (const char *what, const char *str, size_t *rp)
> > +{
> > +  unsigned long long r;
> > +  char *end;
> > +
> > +  PARSE_ERROR_IF_NEGATIVE;
> > +  errno = 0;
> > +  r = strtoull (str, &end, 0);
> > +#if SIZE_MAX != ULONGLONG_MAX
> > +  if (r > SIZE_MAX)
> > +    errno = ERANGE;
> > +#endif
> > +  PARSE_COMMON_TAIL;
> > +}
> > +
> >  /* Parse a string as a size with possible scaling suffix, or return -1
> >   * after reporting the error.
> >   */
> 
> Hmm, seeing this comment, do we really need raw [s]size_t parsing, or
> can we get away with scaled parsing anywhere that sizes are intended?
> But the code looks correct if you want to keep it.

nbdkit_parse_ssize_t is not used.  nbdkit_parse_size_t is used for a
single case, ‘xz-max-depth’ which is genuinely an array length, so
size_t is (albeit marginally) useful here.

It's not worth dying on this particular hill, but I think there is
some use for all of these functions.

> > diff --git a/server/socket-activation.c b/server/socket-activation.c
> > index b9db67c..50df4ca 100644
> > --- a/server/socket-activation.c
> > +++ b/server/socket-activation.c
> > @@ -59,11 +59,8 @@ get_socket_activation (void)
> >    s = getenv ("LISTEN_PID");
> >    if (s == NULL)
> >      return 0;
> > -  if (sscanf (s, "%u", &pid) != 1) {
> > -    fprintf (stderr, "%s: malformed %s environment variable (ignored)\n",
> > -             program_name, "LISTEN_PID");
> > +  if (nbdkit_parse_unsigned ("LISTEN_PID", s, &pid) == -1)
> >      return 0;
> > -  }
> 
> Possible semantic change: on parse failure, are we still printing
> something to stderr, or is nbdkit_error() not effective this early?

Yes nbdkit_error works right from the start.  We have already used it
for parsing server argv[].

> > +++ b/server/test-public.c
> 
> > +static bool
> > +test_nbdkit_parse_ints (void)
> > +{
> > +  bool pass = true;
> > +  char s[64];
> > +
> > +#define PARSE(TYPE, FORMAT, TEST, RET, EXPECTED)                        \
> > +  do {                                                                  \
> > +    error_flagged = false;                                              \
> > +    TYPE i = 0;                                                         \
> > +    int r = nbdkit_parse_##TYPE ("test", TEST, &i);                     \
> > +    if (r != RET || i != EXPECTED) {                                    \
> > +      fprintf (stderr,                                                  \
> > +               "%s: %d: wrong parse for %s: r=%d i=" FORMAT "\n",       \
> > +               __FILE__, __LINE__, TEST, r, i);                         \
> > +      pass = false;                                                     \
> > +    }                                                                   \
> > +    if ((r == -1) != error_flagged) {                                   \
> > +      fprintf (stderr,                                                  \
> > +               "%s: %d: wrong error message handling for %s\n",         \
> > +               __FILE__, __LINE__, TEST);                               \
> > +      pass = false;                                                     \
> > +    }                                                                   \
> > +  } while (0)
> > +#define OK 0
> > +#define BAD -1
> 
> Nice.

But I discovered you cannot do:

  #define BAD -1, 0

which is a shame :-(

> > +
> > +  /* Test the basic parsing of decimals, hexadecimal, octal and
> > +   * negative numbers.
> > +   */
> > +  PARSE (int, "%d", "0",    OK, 0);
> > +  PARSE (int, "%d", "1",    OK, 1);
> > +  PARSE (int, "%d", "99",   OK, 99);
> > +  PARSE (int, "%d", "0x1",  OK, 1);
> > +  PARSE (int, "%d", "0xf",  OK, 15);
> > +  PARSE (int, "%d", "0x10", OK, 16);
> > +  PARSE (int, "%d", "0xff", OK, 255);
> > +  PARSE (int, "%d", "01",   OK, 1);
> > +  PARSE (int, "%d", "07",   OK, 7);
> > +  PARSE (int, "%d", "010",  OK, 8);
> > +  PARSE (int, "%d", "+0",   OK, 0);
> > +  PARSE (int, "%d", "+99",  OK, 99);
> > +  PARSE (int, "%d", "+0xf", OK, 15);
> > +  PARSE (int, "%d", "+010", OK, 8);
> > +  PARSE (int, "%d", "-0",   OK, 0);
> > +  PARSE (int, "%d", "-99",  OK, -99);
> > +  PARSE (int, "%d", "-0xf", OK, -15);
> 
> I'd test upper case at least once, such as -0XF here.  Testing leading
> whitespace would also be nice.

Ah yes, good idea.

> > +  PARSE (int, "%d", "-010", OK, -8);
> > +  PARSE (int, "%d", "2147483647", OK, 2147483647); /* INT_MAX */
> > +  PARSE (int, "%d", "-2147483648", OK, -2147483648); /* INT_MIN */
> > +  PARSE (int, "%d", "0x7fffffff", OK, 0x7fffffff);
> > +  PARSE (int, "%d", "-0x80000000", OK, -0x80000000);
> 
> Matches our earlier assumptions of twos-complement targets.
> 
> > +
> > +  /* Test basic error handling. */
> > +  PARSE (int, "%d", "",        BAD, 0);
> > +  PARSE (int, "%d", "-",       BAD, 0);
> 
> I might also test "- 0" as BAD.

OK.

> > +  PARSE (int, "%d", "+",       BAD, 0);
> > +  PARSE (int, "%d", "++",      BAD, 0);
> > +  PARSE (int, "%d", "++0",     BAD, 0);
> > +  PARSE (int, "%d", "--0",     BAD, 0);
> > +  PARSE (int, "%d", "0x",      BAD, 0);
> 
> I'd also test "08" as BAD.

OK.

> > +  PARSE (int, "%d", "42x",     BAD, 0);
> > +  PARSE (int, "%d", "42-",     BAD, 0);
> > +  PARSE (int, "%d", "garbage", BAD, 0);
> > +  PARSE (int, "%d", "2147483648", BAD, 0); /* INT_MAX + 1 */
> > +  PARSE (int, "%d", "-2147483649", BAD, 0); /* INT_MIN - 1 */
> > +  PARSE (int, "%d", "999999999999999999999999", BAD, 0);
> > +  PARSE (int, "%d", "-999999999999999999999999", BAD, 0);
> 
> It might also be worth testing "0x1p1" as invalid (and not an accidental
> hex-float).  Other possible non-integer strings to be sure we don't
> accidentally parse include "inf", "nan", "0.0".

OK.

> > +
> > +  /* Test nbdkit_parse_unsigned. */
> > +  PARSE (unsigned, "%u", "0",    OK, 0);
> > +  PARSE (unsigned, "%u", "1",    OK, 1);
> > +  PARSE (unsigned, "%u", "99",   OK, 99);
> > +  PARSE (unsigned, "%u", "0x1",  OK, 1);
> > +  PARSE (unsigned, "%u", "0xf",  OK, 15);
> > +  PARSE (unsigned, "%u", "0x10", OK, 16);
> > +  PARSE (unsigned, "%u", "0xff", OK, 255);
> > +  PARSE (unsigned, "%u", "01",   OK, 1);
> > +  PARSE (unsigned, "%u", "07",   OK, 7);
> > +  PARSE (unsigned, "%u", "010",  OK, 8);
> > +  PARSE (unsigned, "%u", "+0",   OK, 0);
> > +  PARSE (unsigned, "%u", "+99",  OK, 99);
> > +  PARSE (unsigned, "%u", "+0xf", OK, 15);
> > +  PARSE (unsigned, "%u", "+010", OK, 8);
> > +  PARSE (unsigned, "%u", "-0",   BAD, 0);
> > +  PARSE (unsigned, "%u", "-99",  BAD, 0);
> > +  PARSE (unsigned, "%u", "-0xf", BAD, 0);
> > +  PARSE (unsigned, "%u", "-010", BAD, 0);
> 
> I'd also test at least " 0" and " -0" (or similar), to prove we handled
> leading spaces correctly.

OK.

> > +  PARSE (unsigned, "%u", "2147483647", OK, 2147483647); /* INT_MAX */
> > +  PARSE (unsigned, "%u", "-2147483648", BAD, 0); /* INT_MIN */
> > +  PARSE (unsigned, "%u", "0x7fffffff", OK, 0x7fffffff);
> > +  PARSE (unsigned, "%u", "-0x80000000", BAD, 0);
> > +
> > +  /* Test nbdkit_parse_long. */
> > +  PARSE (long, "%ld", "0", OK, 0);
> > +  assert (snprintf (s, sizeof s, "%" PRIi64, (int64_t) LONG_MAX) != -1);
> > +  PARSE (long, "%ld", s, OK, LONG_MAX);
> > +  assert (snprintf (s, sizeof s, "%" PRIi64, (int64_t) LONG_MIN) != -1);
> > +  PARSE (long, "%ld", s, OK, LONG_MIN);
> > +  PARSE (long, "%ld", "999999999999999999999999", BAD, 0);
> > +  PARSE (long, "%ld", "-999999999999999999999999", BAD, 0);
> 
> snprintf() side effects inside assert(). Tsk tsk - the test will now
> fail under -DNDEBUG.  And that's part of why I don't think we want to
> expose parse_long.

We should probably loudly break or skip the tests if they
are compiled with -DNDEBUG.  We're using assert for testing
in a couple of places.

> > +++ b/server/usergroup.c
> > @@ -97,7 +97,7 @@ parseuser (const char *id)
> >  
> >      saved_errno = errno;
> >  
> > -    if (sscanf (id, "%d", &val) == 1)
> > +    if (nbdkit_parse_int ("parseuser", id, &val) == 0)
> 
> Is 'int' still the right type for this?

Hopefully it will warn us if uid_t stops being int.  (Also
we're assuming pid_t == int).

Thanks,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW