[Libguestfs] RFC: *scanf vs. overflow

Eric Blake eblake at redhat.com
Fri May 22 20:59:14 UTC 2020


It has long been known that the C specification of *scanf() leaves 
behavior undefined for things like
int i;
sscanf("9999999999999999", "%i", &i);

C11 7.21.6.2 P12
"Matches an optionally signed integer, whose format is the same as 
expected for the subject sequence of the strtol function with the value 
0 for the base argument."
C11 7.21.6.2 P10
"If this object does not have an appropriate type, or if the result of 
the conversion cannot be represented in the object, the behavior is 
undefined."

as there is an overflow when consuming the input which matches the 
strtol subject sequence but does not fit in the width of an int.  On my 
Linux system, 'man sscanf' mentions that ERANGE might be set in such a 
case, but neither C nor POSIX actually requires this behavior; other 
likely behaviors is storing the value mod 2^32 into i, or storing 
INT_MAX into i, or ...

This is annoying - the only safe way to parse integers from 
untrustworthy sources, where overflow MUST be detected, is to manually 
open-code strtol() calls, which can get quite lengthy in comparison to 
the concise representations possible with *scanf.

Would glibc be willing to consider a GNU extension to add an optional 
flag character between '%' and the various numeric conversion specifiers 
(both integral based on strto*l, and floating point based on strtod), 
where we could force *scanf to treat numeric overflow as a matching 
failure, rather than undefined behavior?  Or even a second flag to 
request that printf stop consuming characters if the next character in 
input would cause overflow in the current specifier, leaving that 
character to instead be matched to the remainder of the format string?

Let's suppose for arguments that we add '^' as a request to force 
overflow to be a matching error.  Then sscanf("9999999999999999", "%^i", 
&i) would be well-specified to return 0, rather than returning 1 with an 
unknown value assigned into i or any other behavior that other libc do 
with the undefined behavior when the ^ is not present.

And if glibc likes the idea of such an extension, and we see an uptick 
in applications actually using it, I'd also be happy to champion the 
addition of such an extension in POSIX (but the POSIX folks will 
definitely want to see existing practice first - both an implementation 
and applications that use that implementation).  The libguestfs suite of 
programs is willing to be an early adopter, if glibc is willing to 
pursue adding such a safety valve.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




More information about the Libguestfs mailing list