[Libguestfs] RFC: *scanf vs. overflow

Rich Felker dalias at libc.org
Sat May 23 16:21:12 UTC 2020


On Sat, May 23, 2020 at 08:06:54AM +0100, Richard W.M. Jones via Libc-alpha wrote:
> The context to this is that nbdkit uses sscanf to parse simple file
> formats in various places, eg:
> 
> https://github.com/libguestfs/nbdkit/blob/b23f4f53cf71326f1dba481f64f7f182c20fa3dc/plugins/data/format.c#L171-L172
> https://github.com/libguestfs/nbdkit/blob/b23f4f53cf71326f1dba481f64f7f182c20fa3dc/filters/ddrescue/ddrescue.c#L98
> 
> We can only do this safely where we can prove that overflow does not
> matter.

Being that it's specified as UB, it can never "not matter";
arbitrarily bad side effects are permitted. So it's only safe to use
scanf where the input is *trusted not to contain overflowing values*.

What would be really nice to fix here is getting the standard to
specify that overflow has behavior like strto* or at least
"unspecified value" rather than "undefined behavior" so that it's safe
to let it overflow in cases where you don't care (e.g. you'll be
consistency-checking the value afterwards anyway).

> In other cases we've had to change sscanf uses to strto* etc
> which is much more difficult to use correctly.  Just look at how much
> code is required to wrap strto* functions to use them safely:
> 
> https://github.com/libguestfs/nbdkit/blob/b23f4f53cf71326f1dba481f64f7f182c20fa3dc/server/public.c#L113-L296

Really that much code is just for the sake of verbose error messages,
and they're not even accurate. "errno!=0" does not mean "could not
parse"; it can also be overflow of a perfectly parseable value. And if
you've already caught errno!=0 then end==str is impossible (dead
code). The last case, not hitting null, is also likely spurious/wrong;
you usually *want* to pick up where strto* stopped, and the next thing
the parser does will catch whether the characters after the number are
valid there or not.

strto* do have some annoying design flaws in error reporting, but
they're not really that hard to use right, and much easier than scanf
which just *lacks the reporting channels* for the kind of fine-grained
error reporting you're insisting on doing here.

FWIW this code would also be a lot cleaner as a static inline function
rather than a many-line macro.

Rich




More information about the Libguestfs mailing list