[Libguestfs] [PATCH nbdkit] server: public: Add nbdkit_parse_* functions for safely parsing integers.
Richard W.M. Jones
rjones at redhat.com
Mon Sep 23 17:52:26 UTC 2019
On Mon, Sep 23, 2019 at 12:05:11PM -0500, Eric Blake wrote:
> > + int nbdkit_parse_long (const char *what, const char *str, long *r);
> > + int nbdkit_parse_unsigned_long (const char *what,
> > + const char *str, unsigned long *r);
>
> Do we really want to encourage the use of parse_long and
> parse_unsigned_long? Those differ between 32- and 64-bit platforms, at
> which point, it's often better to explicitly call out the intended size.
So I'll say why I added these. It may not be a good reason, but here
goes :-)
There are two external APIs we use which need long types, libcurl
(curl_easy_setopt) and libssh (ssh_options_set). The existing
parameters had type long (I don't think unsigned long is actually
used) which we passed directly to one of these APIs. To use another
type would involve a slightly awkward check + cast, with the danger
that the code would break on the rarely tested 32 bit platform.
...
> > + int nbdkit_parse_int16_t (const char *what,
> > + const char *str, int16_t *r);
> > + int nbdkit_parse_uint16_t (const char *what,
> > + const char *str, uint16_t *r);
>
> I guess we can add [u]int8_t variants later if a need arises? Or should
> we do it now?
We certainly could. I couldn't see an immediate reason, except maybe
for the ’mbr-id’ parameter.
> > + int nbdkit_parse_int32_t (const char *what,
> > + const char *str, int32_t *r);
> > + int nbdkit_parse_uint32_t (const char *what,
> > + const char *str, uint32_t *r);
> > + int nbdkit_parse_int64_t (const char *what,
> > + const char *str, int64_t *r);
> > + int nbdkit_parse_uint64_t (const char *what,
> > + const char *str, uint64_t *r);
> > + int nbdkit_parse_ssize_t (const char *what,
> > + const char *str, ssize_t *r);
> > + int nbdkit_parse_size_t (const char *what,
> > + const char *str, size_t *r);
>
> [s]size_t is another one that can differ between 32- and 64-bit
> platforms; but I'm fine with having these function, especially since
> there are still efforts being considered about an alternative kernel ABI
> with 64-bit integers but 32-bit size_t.
Yes I think if we had to drop long or size_t, I'd rather keep the
size_t ones. Those are actually meaningful -- eg. if used for sizing
an internal array.
> > +Parse string C<str> into an integer of various types. These functions
> > +parse a decimal, hexadecimal (C<"0x...">) or octal (C<"0...">) number.
> > +
> > +On success the functions return C<0> and set C<*r> to the parsed value
> > +(unless C<*r == NULL> in which case the result is discarded). On
> > +error, C<nbdkit_error> is called and the functions return C<-1>.
> > +
> > +The C<what> parameter is printed in error messages to provide context.
> > +It should usually be a short descriptive string of what you are trying
> > +to parse, eg:
> > +
> > + if (nbdkit_parse_int ("random seed", argv[1], &seed) == -1)
> > + return -1;
> > +
> > +might print an error:
> > +
> > + random seed: could not parse number: "lalala"
> > +
>
> Do we want to make it part of our documented contract that *r is
> unchanged if we return -1?
Yes I think we can document this. It's useful to have that behaviour
defined because it makes error recovery simpler. I will add it to my
copy.
> > @@ -129,22 +129,20 @@ cache_config (nbdkit_next_config *next, void *nxdata,
> > return 0;
> > }
> > else if (strcmp (key, "cache-high-threshold") == 0) {
> > - if (sscanf (value, "%d", &hi_thresh) != 1) {
> > - nbdkit_error ("invalid cache-high-threshold parameter: %s", value);
> > + if (nbdkit_parse_unsigned ("cache-high-threshold",
> > + value, &hi_thresh) == -1)
>
> Semantic change; previously "010" was '10' and "08" was '8', now "010"
> is '8' and "08" is a failure (because we used "%d" instead of "%i").
> (Multiple spots in this patch). Not a big deal to me, but we may want
> to call it out in the commit message as intentional.
I guess %i (not %d) is the one which parses octal and hex. I think
this change is for the better however since it improves consistency,
so I'll mention it in the commit message.
> > +++ b/include/nbdkit-common.h
> > @@ -84,6 +84,28 @@ extern void nbdkit_vdebug (const char *msg, va_list args);
> > extern char *nbdkit_absolute_path (const char *path);
> > extern int64_t nbdkit_parse_size (const char *str);
> > extern int nbdkit_parse_bool (const char *str);
> > +extern int nbdkit_parse_int (const char *what, const char *str,
> > + int *r);
>
> Should we mark 'what' and 'str' as being non-null parameters? But you
> definitely document 'r' as valid when NULL.
I think we decided to avoid using __attribute__((nonnull)) on the
external API, because it was ... unsafe? I don't recall now if we
avoided it deliberately or not, but we definitely don't use it except
on internal functions!
Yes it would make sense as long as it's not unsafe.
> > +++ b/plugins/curl/curl.c
> > @@ -204,7 +204,9 @@ curl_config (const char *key, const char *value)
> > }
> >
> > else if (strcmp (key, "timeout") == 0) {
> > - if (sscanf (value, "%ld", &timeout) != 1 || timeout < 0) {
> > + if (nbdkit_parse_long ("timeout", value, &timeout) == -1)
> > + return -1;
> > + if (timeout < 0) {
> > nbdkit_error ("'timeout' must be 0 or a positive timeout in seconds");
> > return -1;
>
> Why not change to 'unsigned timeout'? 'long' is rather oversized on
> 64-bit platforms, and keeping things signed doesn't buy much compared to
> just going unsigned. (Yep, that's me trying to get rid of a need for
> parse_long).
See above about curl_easy_setopt.
> > +++ b/plugins/partitioning/partitioning.c
> > @@ -211,10 +211,8 @@ partitioning_config (const char *key, const char *value)
> > else if (strcmp (key, "mbr-id") == 0) {
> > if (strcasecmp (value, "default") == 0)
> > mbr_id = DEFAULT_MBR_ID;
> > - else if (sscanf (value, "%i", &mbr_id) != 1) {
> > - nbdkit_error ("could not parse mbr-id: %s", value);
> > + else if (nbdkit_parse_int ("mbr-id", value, &mbr_id) == -1)
> > return -1;
>
> Is 'int' the best type for this?
Probably not, see above.
> > +++ b/plugins/ssh/ssh.c
> > @@ -59,7 +59,7 @@ static bool verify_remote_host = true;
> > static const char *known_hosts = NULL;
> > static const char **identity = NULL;
> > static size_t nr_identities = 0;
> > -static long timeout = 0;
> > +static unsigned long timeout = 0;
>
> Again, a 32-bit timeout regardless of 32- or 64-bit platform is probably
> saner.
See above about ssh_options_set.
> > +++ b/plugins/vddk/vddk.c
> > @@ -271,10 +271,8 @@ vddk_config (const char *key, const char *value)
> > return -1;
> > }
> > else if (strcmp (key, "nfchostport") == 0) {
> > - if (sscanf (value, "%d", &nfc_host_port) != 1) {
> > - nbdkit_error ("cannot parse nfchostport: %s", value);
> > + if (nbdkit_parse_int ("nfchostport", value, &nfc_host_port) == -1)
> > return -1;
> > - }
>
> Is 'int' the right type here?
Actually this is a bug twice over. The VDDK struct wants uint32_t for
both of the port fields. Obviously since they are port numbers
something like uint16_t would be more suitable. (0 has the special
meaning of "default port").
I think I'll change this to uint16_t because the implicit cast up to
uint32_t when we assign it to the VDDK struct shouldn't be a problem,
and it's useful to have bounds checking.
> > +/* Functions for parsing signed integers. */
> > +int
> > +nbdkit_parse_int (const char *what, const char *str, int *rp)
> > +{
> > + long r;
> > + char *end;
> > +
> > + errno = 0;
> > + r = strtol (str, &end, 0);
> > +#if INT_MAX != LONG_MAX
> > + if (r < INT_MIN || r > INT_MAX)
> > + errno = ERANGE;
> > +#endif
> > + PARSE_COMMON_TAIL;
> > +}
>
> Looks correct.
I actually compiled this and ran the tests with -m32 and -m64. It
took several rounds to get it right.
> > +#endif
> > + if (r < SSIZE_MIN || r > SSIZE_MAX)
> > + errno = ERANGE;
> > +#endif
> > + PARSE_COMMON_TAIL;
> > +}
> > +
> > +/* Functions for parsing unsigned integers. */
> > +
> > +/* strtou* functions have surprising behaviour if the first character
> > + * (after whitespace) is '-', so deny this.
> > + */
> > +#define PARSE_ERROR_IF_NEGATIVE \
> > + do { \
> > + const char *p = str; \
> > + while (isspace (*p)) \
> > + p++; \
> > + if (*p == '-') { \
> > + nbdkit_error ("%s: negative numbers are not allowed", what); \
> > + return -1; \
> > + } \
> > + } while (0)
> > +
>
> Agree. However, why not modify str in-place, so that the subsequent
> strtoul() does not have to repeat our prefix scan?
Yes, that's a good idea, I didn't think it through.
> > + PARSE_COMMON_TAIL;
> > +}
> > +
> > +int
> > +nbdkit_parse_size_t (const char *what, const char *str, size_t *rp)
> > +{
> > + unsigned long long r;
> > + char *end;
> > +
> > + PARSE_ERROR_IF_NEGATIVE;
> > + errno = 0;
> > + r = strtoull (str, &end, 0);
> > +#if SIZE_MAX != ULONGLONG_MAX
> > + if (r > SIZE_MAX)
> > + errno = ERANGE;
> > +#endif
> > + PARSE_COMMON_TAIL;
> > +}
> > +
> > /* Parse a string as a size with possible scaling suffix, or return -1
> > * after reporting the error.
> > */
>
> Hmm, seeing this comment, do we really need raw [s]size_t parsing, or
> can we get away with scaled parsing anywhere that sizes are intended?
> But the code looks correct if you want to keep it.
nbdkit_parse_ssize_t is not used. nbdkit_parse_size_t is used for a
single case, ‘xz-max-depth’ which is genuinely an array length, so
size_t is (albeit marginally) useful here.
It's not worth dying on this particular hill, but I think there is
some use for all of these functions.
> > diff --git a/server/socket-activation.c b/server/socket-activation.c
> > index b9db67c..50df4ca 100644
> > --- a/server/socket-activation.c
> > +++ b/server/socket-activation.c
> > @@ -59,11 +59,8 @@ get_socket_activation (void)
> > s = getenv ("LISTEN_PID");
> > if (s == NULL)
> > return 0;
> > - if (sscanf (s, "%u", &pid) != 1) {
> > - fprintf (stderr, "%s: malformed %s environment variable (ignored)\n",
> > - program_name, "LISTEN_PID");
> > + if (nbdkit_parse_unsigned ("LISTEN_PID", s, &pid) == -1)
> > return 0;
> > - }
>
> Possible semantic change: on parse failure, are we still printing
> something to stderr, or is nbdkit_error() not effective this early?
Yes nbdkit_error works right from the start. We have already used it
for parsing server argv[].
> > +++ b/server/test-public.c
>
> > +static bool
> > +test_nbdkit_parse_ints (void)
> > +{
> > + bool pass = true;
> > + char s[64];
> > +
> > +#define PARSE(TYPE, FORMAT, TEST, RET, EXPECTED) \
> > + do { \
> > + error_flagged = false; \
> > + TYPE i = 0; \
> > + int r = nbdkit_parse_##TYPE ("test", TEST, &i); \
> > + if (r != RET || i != EXPECTED) { \
> > + fprintf (stderr, \
> > + "%s: %d: wrong parse for %s: r=%d i=" FORMAT "\n", \
> > + __FILE__, __LINE__, TEST, r, i); \
> > + pass = false; \
> > + } \
> > + if ((r == -1) != error_flagged) { \
> > + fprintf (stderr, \
> > + "%s: %d: wrong error message handling for %s\n", \
> > + __FILE__, __LINE__, TEST); \
> > + pass = false; \
> > + } \
> > + } while (0)
> > +#define OK 0
> > +#define BAD -1
>
> Nice.
But I discovered you cannot do:
#define BAD -1, 0
which is a shame :-(
> > +
> > + /* Test the basic parsing of decimals, hexadecimal, octal and
> > + * negative numbers.
> > + */
> > + PARSE (int, "%d", "0", OK, 0);
> > + PARSE (int, "%d", "1", OK, 1);
> > + PARSE (int, "%d", "99", OK, 99);
> > + PARSE (int, "%d", "0x1", OK, 1);
> > + PARSE (int, "%d", "0xf", OK, 15);
> > + PARSE (int, "%d", "0x10", OK, 16);
> > + PARSE (int, "%d", "0xff", OK, 255);
> > + PARSE (int, "%d", "01", OK, 1);
> > + PARSE (int, "%d", "07", OK, 7);
> > + PARSE (int, "%d", "010", OK, 8);
> > + PARSE (int, "%d", "+0", OK, 0);
> > + PARSE (int, "%d", "+99", OK, 99);
> > + PARSE (int, "%d", "+0xf", OK, 15);
> > + PARSE (int, "%d", "+010", OK, 8);
> > + PARSE (int, "%d", "-0", OK, 0);
> > + PARSE (int, "%d", "-99", OK, -99);
> > + PARSE (int, "%d", "-0xf", OK, -15);
>
> I'd test upper case at least once, such as -0XF here. Testing leading
> whitespace would also be nice.
Ah yes, good idea.
> > + PARSE (int, "%d", "-010", OK, -8);
> > + PARSE (int, "%d", "2147483647", OK, 2147483647); /* INT_MAX */
> > + PARSE (int, "%d", "-2147483648", OK, -2147483648); /* INT_MIN */
> > + PARSE (int, "%d", "0x7fffffff", OK, 0x7fffffff);
> > + PARSE (int, "%d", "-0x80000000", OK, -0x80000000);
>
> Matches our earlier assumptions of twos-complement targets.
>
> > +
> > + /* Test basic error handling. */
> > + PARSE (int, "%d", "", BAD, 0);
> > + PARSE (int, "%d", "-", BAD, 0);
>
> I might also test "- 0" as BAD.
OK.
> > + PARSE (int, "%d", "+", BAD, 0);
> > + PARSE (int, "%d", "++", BAD, 0);
> > + PARSE (int, "%d", "++0", BAD, 0);
> > + PARSE (int, "%d", "--0", BAD, 0);
> > + PARSE (int, "%d", "0x", BAD, 0);
>
> I'd also test "08" as BAD.
OK.
> > + PARSE (int, "%d", "42x", BAD, 0);
> > + PARSE (int, "%d", "42-", BAD, 0);
> > + PARSE (int, "%d", "garbage", BAD, 0);
> > + PARSE (int, "%d", "2147483648", BAD, 0); /* INT_MAX + 1 */
> > + PARSE (int, "%d", "-2147483649", BAD, 0); /* INT_MIN - 1 */
> > + PARSE (int, "%d", "999999999999999999999999", BAD, 0);
> > + PARSE (int, "%d", "-999999999999999999999999", BAD, 0);
>
> It might also be worth testing "0x1p1" as invalid (and not an accidental
> hex-float). Other possible non-integer strings to be sure we don't
> accidentally parse include "inf", "nan", "0.0".
OK.
> > +
> > + /* Test nbdkit_parse_unsigned. */
> > + PARSE (unsigned, "%u", "0", OK, 0);
> > + PARSE (unsigned, "%u", "1", OK, 1);
> > + PARSE (unsigned, "%u", "99", OK, 99);
> > + PARSE (unsigned, "%u", "0x1", OK, 1);
> > + PARSE (unsigned, "%u", "0xf", OK, 15);
> > + PARSE (unsigned, "%u", "0x10", OK, 16);
> > + PARSE (unsigned, "%u", "0xff", OK, 255);
> > + PARSE (unsigned, "%u", "01", OK, 1);
> > + PARSE (unsigned, "%u", "07", OK, 7);
> > + PARSE (unsigned, "%u", "010", OK, 8);
> > + PARSE (unsigned, "%u", "+0", OK, 0);
> > + PARSE (unsigned, "%u", "+99", OK, 99);
> > + PARSE (unsigned, "%u", "+0xf", OK, 15);
> > + PARSE (unsigned, "%u", "+010", OK, 8);
> > + PARSE (unsigned, "%u", "-0", BAD, 0);
> > + PARSE (unsigned, "%u", "-99", BAD, 0);
> > + PARSE (unsigned, "%u", "-0xf", BAD, 0);
> > + PARSE (unsigned, "%u", "-010", BAD, 0);
>
> I'd also test at least " 0" and " -0" (or similar), to prove we handled
> leading spaces correctly.
OK.
> > + PARSE (unsigned, "%u", "2147483647", OK, 2147483647); /* INT_MAX */
> > + PARSE (unsigned, "%u", "-2147483648", BAD, 0); /* INT_MIN */
> > + PARSE (unsigned, "%u", "0x7fffffff", OK, 0x7fffffff);
> > + PARSE (unsigned, "%u", "-0x80000000", BAD, 0);
> > +
> > + /* Test nbdkit_parse_long. */
> > + PARSE (long, "%ld", "0", OK, 0);
> > + assert (snprintf (s, sizeof s, "%" PRIi64, (int64_t) LONG_MAX) != -1);
> > + PARSE (long, "%ld", s, OK, LONG_MAX);
> > + assert (snprintf (s, sizeof s, "%" PRIi64, (int64_t) LONG_MIN) != -1);
> > + PARSE (long, "%ld", s, OK, LONG_MIN);
> > + PARSE (long, "%ld", "999999999999999999999999", BAD, 0);
> > + PARSE (long, "%ld", "-999999999999999999999999", BAD, 0);
>
> snprintf() side effects inside assert(). Tsk tsk - the test will now
> fail under -DNDEBUG. And that's part of why I don't think we want to
> expose parse_long.
We should probably loudly break or skip the tests if they
are compiled with -DNDEBUG. We're using assert for testing
in a couple of places.
> > +++ b/server/usergroup.c
> > @@ -97,7 +97,7 @@ parseuser (const char *id)
> >
> > saved_errno = errno;
> >
> > - if (sscanf (id, "%d", &val) == 1)
> > + if (nbdkit_parse_int ("parseuser", id, &val) == 0)
>
> Is 'int' still the right type for this?
Hopefully it will warn us if uid_t stops being int. (Also
we're assuming pid_t == int).
Thanks,
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
More information about the Libguestfs
mailing list