[Libguestfs] [PATCH nbdkit 1/5] Add new public nbdkit_parse_probability function

Wed May 17 14:15:33 UTC 2023

On 5/16/23 22:22, Richard W.M. Jones wrote:
> On Tue, May 16, 2023 at 02:35:38PM -0500, Eric Blake wrote:

>> Our pre-existing error filter is already doing percentage calculations
>> via floating point, so on the grounds that this patch is consolidating
>> common code patterns to avoid reimplementation, we're good. But as you
>> say, representing things as a ratio between two integers may be more
>> maintainable than floating point where accuracy matters, or this may
>> be a case where we are explicit that use of the floating point
>> fractions as a percentage is intentionally liable to rounding issues.
>> Still, avoiding undefined behavior when the division can overflow
>> (whether that be the overflow to floating point infinity or even the
>> overflow beyond UINT64_MAX), does make it worth questioning if
>> floating point is our best representation, or at a minimum if we want
>> to be stricter in what we parse before passing a substring on to
>> strtod() so that we have more control over the values (part of my 19
>> patches to qemu was explicitly truncating any string at 'e' or 'E' so
>> that strtod() could not do an exponent parse that would inadverently
>> hit overflow to INFINITY).
> 
> I'm not really sure we reached a conclusion so far, but I did want to
> say that I changed the evil filter impl so that it now treats any
> 0 <= P < 1e-12 as effectively 0.  (P == evil-probability; P < 0 and P > 1
> are still rejected at config time as before).
> 
> With P == 1e-12:
> 
> nbdkit: debug: evil: mode: stuck-bits, P: 1e-12
> nbdkit: debug: evil: block_size: 17592186044416 (2**44)
> nbdkit: debug: evil: expected bits per block: 140.737
> 
> (The large block size isn't an issue here as nothing is allocated.
> However it would be a problem if the block_size calculation overflows,
> and I believe that limiting P to larger values avoids this.)

Right, this is about the direction I could agree with -- and again I
want to emphasize that you don't *need* my agreement to proceed with the
floating point approach!

I think enforcing a minimum for P could be useful too, for wherever we
divide by P.

> Also I'm not sure that handling probabilities as a numerator and
> denominator actually helps here?  Both still need to be stored as
> floats, so it just pushes the same problem to the plugin.

No, the point with N/D is that they're both positive integers (well, N
may be zero as well), and that they *together* give you a single
rational probability, namely N/D. They'd be stored as separate unsigned
integer fields everywhere, and the division would only be performed (in
unsigned integer) wherever directly needed, such as in (a * N) / D.

I don't know if this is compatible with existent code. If we already
have code that parses floats (and Eric's comments imply we do), then
that code couldn't be rebased on this N/D representation, compatibly.
But, I'd consider introducing a new (in a way, "divergent") N/D
representation a fair price, for not adding more floating point!
Refactoring (extracting common code) is only useful if we can fix the
extracted code later on, and that's a tough challenge with floating point.

Laszlo